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LIST OF SYMBOLS 



Symbols which are used only once in the text are not shown 
below. Some symbols are used in more than one context* in which 
case more than one definition appears next to that symbol. 

A: Initial set-up costs 

B: A forcing set 

C: Sampling costs 

D: Annual educational costs 

D' : Costs of an educational sub-unit 

E : Expected value 

F: A cumulative density function 

K: Population with the highest mean 

M : Survival factor 

N: The size of the total available sample 

P: Probability; also, productivity 

R: Discount factor 

S: The sum of sample values 

T: Nominal learning time for an educational sub-unit 

U: The number of repetitions of an experiment 

V: Present worth of educational costs 

W: Present worth of expected life -cycle productive output 

X: A sample value 

a: Age at which one starts an educational unit 

b: Retirement age 

c: A proportionality or weighting factor relating a sub-unit 

to the unit 

d: A transform giving equivalent dollars for any given dates 

f : A transform giving dollar values from productive output 

g: Grades 
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D: 



A transform relating current to previous grades 
(As a subscript) The identification tag for each of the 
samples from the j-th population; otherwise identifies 
an educational unit 

The identification tag for a population 

The total number of populations 

Identifies an educational sub-unit 

(Subscripted) The a priori sample mean; otherwise 

signifies years of experience 

The trial number or sample number in a sequence 
of samples 

Relative sampling cost; also, a transform giving projected 

values from a history of previous values 

Order of a polynomial 

Discount rate 

Sample standard deviation 

Learning time 

Identifies the particular repetition of an experiment 
A transform relating grades to subsequent output 
A random variable 

Current date; also date student will complete educational unit 
Date of starting productive output 
A dummy variable 
Personality factor 

Factor describing history of past performance 
Population mean 
A population 

Population standard deviation 

Years from the date of starting a given educational unit 
Product sign 












A : Designates group taking a specific educational unit 

*: Designates a matched group not taking a specific 

educational unit 

$: Reported median earnings 

"T: Idealized learning time for a given educational unit 

Designates the number of remaining observations 
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SECTION I 
INTRODUCTION 



In recent years engineers have become increasingly involved in 
the study of adaptive teaching systems. There are research groups in- 
volved in such studies at the University of Illinois * Coordinated Science 
Laboratory, at the Massachusetts Institute of Technology, in numerous 
other schools, and in engineering firms throughout the world. 

A brief analysis will be made here of the earlier work, and 
some additional concepts on criteria functions, decision rules, and 
utility functions for adaptive educational systems will be introduced. 

In an educational context, the word "system" isjised to describe 
such diverse things as "The Blank County School System" and "Dash 
Publishing Company’ s Self-Instructional System for Slide -Rule Compu- 
tations". For convenience in exposition "systems" will be roughly 
divided into four categories. 

Micro -micro systems ; concerned with a transformation 
of students’ behavior by a single, relatively short 
sequence of leading items. 

Micro systems : concerned with a transformation of 
students' behavior by a longer sequence of learning 
items, such as are encountered in a semester course. 

MacxX) systems ; a collection of micro systems characterized 
by a curriculum or curricula in a school, university or 
school district. 

Macro -macro systems : related to the transformation 
of students' behavior by the total learning experience 
encountered during the students' lives. 

Almost all of the previous studies on adaptive decision struc- 
tures have been concerned with micro-micro systems. Criteria 
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functions have not been explicitly stated, and therefore the decision 
rules have generally been non-optimal. Also, the transformations 
achieved by the system have been measured in abstract units (such as 
grades) which are left unrelated to value scales or ’’utility” outside of 
the system, thus precluding external evaluation of the system. The 
adaptive decision structure suggested in the following sections is com- 
pletely general and is intended for systems of all sizes. However, 
since a utility function for the output of educational systems will be 
introduced, and since data for this utility function are most readily 
available for macro or macro-macro systems, the approach will be to 
start with the larger systems and work down to the micro -micro systems. 

Adaptive decisions have always existed in educational systems. 
Course content and pedagogical techniques have changed in response to 
changes in the social, cultural, economic and technological aspects of 
the environment. However, the rules governing such change have 
seldom been explicitly recognized or stated, and the information needed 
for making decisions has often been incomplete. An adaptive decision 
structure is one which removes much of the decision-making function 
from the intuitive realm by providing a plan for accumulating relevant 
data and using these data according to a preconceived plan or decision 
rule to change or rearrange sub-elements of the system in order to 
achieve a predetermined criterion in some optimal fashion. Further- 
more, this criterion should have some ’’utility” outside of the system. 

An adaptive decision structure therefore requires: 

1. Data gathering and handling capability. 

2. A criterion function. 

3. Decision rules. 

4. A utility function. 

Only brief consideration will be given to the first of these re- 
quirements. The assumption will be made here that the amount of data 







that is ideally required for an adaptive decision structure is sufficiently 
voluminous to require the use of modern data-processing equipment. 
Whether or not this data-processing equipment is also used for present- 
ing course content material directly to students (as in the computer- 
controlled "teaching machines") is a side issue to the main stream of 
thought. Questions of this type can be readily resolved by the techniques 
developed in the following sections. Another assumption that will per- 
vade all of the subsequent discussion is that flexible scheduling (for 
individual students, at any time during the school year) is a desirable 
feature of an educational system using an adaptive decision structure. 
The merit of this assumption will become clearer when the utility func- 
tion is described. The practical i iplementation of flexible scheduling 
will undoubtedly be enhanced by the use of data-processing equipment. 

The main contributions of the following sections will be in the 
area of criteria functions, decision rules, and utility functions for 
adaptive decision structures, and can be summarized as: 

a. The description of a criterion function for an adaptive decision 
structure in an educational system where two processes are 
being carried out simultaneously, namely, (a) students are 
learning subject matter, and (b) the system controllers are 
learning about the student* s learning. Process (b) may include 
exploratory use of various alternative pedagogical procedures 
or subject matter, some of which may result in better student 
performance than others. In such a situation there is a trade- 
off between processes (a) and (b). The suggested criterion func- 
tion is the sum of the net utility of all students' outputs, and 
obviously this function should be maximized. 

b. The description of decision rules which tend to maximize the 
criterion function under different conditions of a priori inform- 
ation. In particular, some qualitative rules are obtained for 



the case of "total a priori ignorance", i.e., where there is no 
a priori information on the distribution of the net utility of 
students' outputs. Also, an extension is made to the procedure 
for two-stage sequential sampling from two normally distributed 
populations to include the case where the costs of taking or 
observing sample data is of some consequence. Of most interest 
is the development here of a computational backwards -induction 
solution for the multi-stage or continuous sampling procedure 
from k normal populations. This solution is applicable to 
problems outside the educational context and should be of interest 
in such fields as medical testing, agricultural experiments, 
production line evaluation and in many other fields where the 
criterion is to maximize the sum of net outputs. The proce- 
dure used can be generalized to binomial and other distributions . 
c. The description of a utility function for converting such available 
measures as student grades, student learning time, teacher 
inputs, school capital and maintenance costs, etc. into a net 
value of the transformation effected by the system. Current 
measures of student output are used to derive a present worth of 
the student's expected life-cycle productive output (PWSELCPO) 
and these are compared with PWSELCPO for alternate system 
configurations . 

In order to assign a value to the net output from an educational 
system one not only needs a utility function but also data to feed into 
such a function. Many of the necessary data are currently nonexistent 
or otherwise unavailable. Therefore, some rather strong restrictions 
must be imposed on the utility function so that it can operate with 
reduced precision with existing data. The important point to note at this 
stage is that in at least one case there is probably enough information 
available to start using an adaptive decision structure which includes a 
utility function. That case is in the field of engineering education. 




where most data exist relating students* performance in school to sub- 
sequent lif e -cycle productive outputs. A start must be made some- 
where; otherwise there is little prospect that the additional data 
required for a more precise utility function will ever be accumulated. 
It may seem like a boot-strap operation to prescribe such a structure 
from incomplete information, but an adaptive decision structure is 
dedicated to making decisions in the face of uncertainty or incomplete 
information. Adaptive decision theorists suggest policy iteration [1] 
as a means of sequentially approaching the desired end. 




SECTION II 



BACKGROUND 



A. The Three Approaches 



In the last few years, many people have talked about the 
possibility of applying some of the tools of modern technology to the 
teaching “learning process. However, the "tools'* that are proposed dif- 
fer with the professional background of the proponents. For example, 
the psychologist is usually most interested in the learning theory 
approach, in which stimulus -response concepts are selectively applied 
to the micro-micro aspects of the educational process. Many experi- 
menters and theorists have contributed to this approach (Thorndike, 
Hull, Skinner, Estes, etc. ). Some of the most commonly quoted con- 
cepts in this approach are : 

1. Principle of reinforcement: Certain environmental effects 
strengthen the behavior which has produced these effects 
(a correct response to a question, properly rewarded, will 
increase the probability that the correct response will be 
subsequently elicited on meeting the same or similar 
question). 

2. Principle of gradual progression: Use a series of progressive 
approximations so as to lead, finally, to the required complex 
behavior. By giving reinforcement for each of the responses 
in the series making up the complex pattern, the desired 
behavior is gradually shaped. 



3. Immediacy of reinforcement: Probability of future correct 
responses is inversely proportional to the time lapse between 
a response and its reinforcement. Furthermore opportunity 
for frequent responding and reinforcement helps maintain 
learner' s interest and attention. 
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Some of these psychologically prescribed techniques may sound very 
similar to procedures which are currently used by many experienced 
instructors, and indeed they are. However, there may be a difference 
of degree. For example, Skinner breaks the learning sequence into 
extremely small steps — generally short sentences -- and he has 
indicated that the only way economically to arrange the optimum con- 
ditions of reinforcement, immediacy, precision and frequency of 
response is in a teaching machine. There are problems with the learn- 
ing theory approach: 

a. The early theories are relatively simple and ignore many of 
the variables which affect human learning. Partly because 
of this, experimental attempts to confirm the theories with 
human subjects have not been spectacularly successful. 

More complex multivariable formulations have been slow in 
coming. 

b. The reinforcement (the feedback of the "systems" approach) 
has been largely limited to the learner, and only haphazardly 
applied to the instructor, with the result that systematic 
improvements in instructional material or presentation 
methods are scarce. 

Another approach, often proposed by engineers, is the systems 
approach in which people (as students and as teachers) are major com- 
ponents in the system. Generally, this approach emphasizes the 
"control" advantage of feedback to the student, to the instructor, and 
to the system evaluator (faculty or society). Feedback control system 
analogies are loosely used with emphasis on inputs, outputs, transform 
means, and system constraints. This approach has the following 
problems: 

1. Educational "system" goals are difficult to express in 
operational terms. 



2 . Outputs are difficult to evaluate . 

3. The function of time, a necessary element of most feedback 

control systems, has an ambiguous role in education. 

Fundamental contributions from this approach will be limited until the 
above problems are resolved. 

The third approach is the data-handling approach, which is rela- 
tively unconcerned with any specific learning theory or method for 
evaluating the educational system outputs. The proponents of this 
approach claim that with any given sub-set of teaching-learning pro- 
cedures and with any given measure of output the use of modern data- 
handling and logic devices would permit much more extensive sampling 
of pertinent data and use of discriminative decision-making and that 
improvements in the teaching-learning process can be greatly accel- 
erated, largely on an experiential basis. 

None of the above approaches is completely independent of the 
others. Some balanced blend of the three will probably emerge. All 
approaches emphasize individual learning and the accumulation of know- 
ledge about the teaching-learning process. All point toward increased 
mechanization of the bookkeeping chores (grading, record keeping, 
scheduling); and at least the latter two approaches point toward mech- 
anization and possibly automation of the presentation of learning exper- 
iences to the student. 

An early study at UCLA [2], and many subsequent experiments, 
indicated that certain kinds of mechanization are ill-advised, primarily 
because use of the mechanism does not yield ’’better” student learning 
than use of less expensive non-mechanized procedures, and some mech- 
anized devices actually hamper student learning. Nevertheless, it is 
recognized that just to record and manipulate the multitude of contingent 
circumstances which affect the teaching -learning process, an efficient 
data-handling and logic device would be required. The modern digital 



computer and ancillary equipment have the desired capability, and it is 
therefore interesting to examine how some people have used the com- 
puter in teaching systems . 

B. Computers in Teaching Systems 

It will be noted that most of the computer-based teaching 
systems described below are micro-micro systems, concerned with 
small sequences of learning items. Historically, the impetus for the 
development of computer-based educational systems came from people 
primarily imbued with the learning theory approach, even though these 
people were often engineers or mathematicians working for companies 
dedicated to systems analyses or to computer design. Interest in the 
use of computers for larger (macro) systems has received later and 
less comprehensive consideration, and almost nothing has been done on 
input-output analyses for computer-based macro-macro systems. 

While the primary concern here is not with micro-micro systems 
(the so-called "teaching machines"), a review of the work on these 
micro -micro systems is revealing, because some fundamental 
problems arise in these smaller systems which are typical of all 
educational systems, regardless of size. 

In 1958, Gustave Roth, Nancy Anderson and R. C. Brainerd of 
the IBM Research Center, following a suggestion from Dr> William J. 
McGill of Columbia University, used an IBM 650 computer to simulate 
a teaching machine. The group was primarily interested in the general 
characteristics of teaching machines and felt that it would be easier and 
perhaps less expensive to simulate different kinds of teaching machines 
with an available computer than to actually construct a number of differ- 
ent kinds of teaching machines. This was the first of a series of 
investigations in which it was suggested that the computer was valuable 
for educational research purposes but uneconomical as a regular 

training device. 
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Counting, addition, subtraction, multiplication and division in 
binary arithmetic were taught to individual students via a typewriter 
input -output station. The machine verifies the student's inputs digit 
by digit and signals him only when he makes a mistake. The computer 
program allows for individual differences in skill level and rate of 
learning. If a student is making no errors, he is given an option to 
skip 2, 1, or no problems. 

When the student makes an error, the choice of the next prob- 
lem depends on the number of errors the student has made on that 
section of the binary arithmetic course. If the student made fewer 
than 5 errors, the computer presents a problem at the same difficulty 
level as the last problem he completed correctly. If he makes more 
than 5 errors, he is presented a problem similar in difficulty to one 
of the first problems in that section of the course . Therefore, branch- 
ing forward is at the student* s option, and branching backward is based 
on some a priori decision written into the computer program. 

Work by the group was discontinued in 1959, but in 1961 a new 
group under William Uttal resumed work on computer-based teaching. 
Encouraged by Professor Merrill Flood of the University of Michigan, 
the group believed that they could demonstrate the economic feasibility 
of computer-based teaching systems by providing multiplex student input- 
output stations per computer. Currently, the group is using a transis- 
torized IBM 1410, a multiplexer, four input-output buffers, a card punch 
and reader, one psychomotor skill station (for teaching stenotyping), 
six typewriter stations (for teaching statistics and German), a real time 
clock, and an *£M 355 digital disc storage unit with an IBM 652 control 
unit which provides a random access audio memory (used for the steno- 
type and German language training). 

Some spectacular results have been obtained by Uttal* s group. 

For example, in the statistics course a group of six students completed 



half the semester' s work in an average of 5. 3 hours with an average 
mid-term examination grade of 94. 3%, whereas a group of eight 
matched students,, taking a lecture course at a university from the 
same instructor who wrote the program for the computer-administered 
course, required 24 hours of class lectures plus an average of 25 
hours homework to get an average grade of 58.4%. 

In correspondence and conversations with Dr. Uttal, he admits 
that the control programs are arrived at largely on an intuitive basis 
and require a good deal of cut and try modifications. No attempt is 
made at optimizing the structure of the programs by experimentation. 
The computer is not being used to calculate anything, but rather is 
being used as a data throughput and comparison system. 

At Bolt, Beranek and Newman, Inc., J. C. R. Licklider, 

J. A. Sweats, and associates have been using a Digital Equipment 
Corporation PDP-1 computer which can use either a typewriter or a 
cathode tube and light pen as an input -output station. Some of the 
early work by this group in teaching sound discrimination to sonar 
operators was unsuccessful, possibly because an a priori decision 
was made to use branching techniques for student acquisition of rela- 
tively meaningless non-verbal sounds which actually had very little 
sequential relationship. Licklider and Sweats' application of human 
engineering techniques is perhaps more important than their applica- 
tions of learning theory to computer aided teaching. By careful 
consideration of the man-machine interface, they were able to reduce 
learning t im e by at least 50%. Of further interest are their attempts 
to teach relations between the symbolic and the graphical representa- 
tion of mathematical functions by having the student explore the effect 
of changing the coefficients of an equation and watching the resultant 
change of the graphical representation on the oscilloscope screen. 

By careful attention to the multiplexing problem and by use of a special 



purpose computer, the team at Bolt, Beranek and Newman, Inc. has 
succeeded in bringing the cost of computer, ancillary eqipment, and 
overhead down to $1. 50 per student-hour, and anticipate that these 
costs could be further reduced to less than a dollar per student-hour, 
which is well within the range of current teaching costs. 

At the Coordinated Science Laboratory of the University of 
Illinois, engineers D. L. Bitzer, P. G. Braunfeld, and W. W. 
Lichtenberger have used the old ILLIAC computer in conjunction with 
two alpha-numeric student input stations, course material stored on 
an electronically scanned set of slides, and two TV tube output stations. 
The computer program is cleverly conceived to allow for individual 
student differences. The program provides the student with an 
opportunity to determine the branching procedure by giving him the 
option to call for "help" sequences or to transfer out of a "help" 
sequence at any point ^in the sequence. The student can also use the 
computer for computational work to help speed solutions to problems 
in which computational skill is not the primary objective of the lesson. 

At the System Development Corporation, John Coulson and 
Harry Silberman initially used a Bendix G-15 computer, random 
access slide projector and buffering system, a typewriter input 
station and opaque screen output station. This was a single station 
system, but more recently SDC has been using a Philco 2000 computer 
with a twenty -station multiplexed system. The student station contains 
multiple choice buttons for student inputs and a numbered read-out 
window which guides the student to numbered items in a programmed 
text. 

The new SDC installation is also the first to try to go beyond 
the micro -micro approach, in that consideration is given to using the 
computer as a data-handling device which would provide diagnostic 



information on student performance to the teacher-counselor and to 
the instructor-program writer and would provide scheduling and 
"systems evaluation" to the school administrator. 

All of the computer-based systems mentioned above place con- 
siderable emphasis on flexibility in selecting items of instruction to 
present to the student. Different items can be presented to each 
student depending on his history of responses to previous items. How- 
ever, there is a major flaw in all of the above-mentioned procedures. 

A fixed set of rules as set down in the computer programs controls 
the teaching-learning process. These rules are usually intuitively 
determined and their effectiveness is seldom verified by systematic 
experimentation. Almost all of the people mentioned above relate how 
much time they spend changing elements of their computer control 
programs procedures for evaluating and modifying the a priori 
elements of these programs. This is somewhat surprising, since 
most of these experimenters agree that feedback on student progress 
could be used for on-line alteration of the curriculum sequence or 
pedagogical procedures, and would probably have more important 
long-range cultural significance than the simple feedback (knowledge 
of results fed back to the student) currently in use. 

One of the earliest proponents of a variable, rather than a 
fixed, decision process in a teaching system was Gordon Pask of 
Systems Research, Ltd. His earlier work on "self -organizing" 
systems led to his propounding [3] the idea of a self -organizing 
teacher, (automaton) whose first problem is to find a language common 
to both itself and the student so that the two can "talk" to each other. 

To establish such conversational interaction, the automaton must be 
capable of theorizing and model building, and by trying different 
strategies (arising from different "theories") to eventually build a 
model which relates the automaton to the student in a satisfactory 



manner. Then it can effectively communicate new concepts to the 
student. Pask suggests that such an adaptive teaching machine can 
be designed in complete ignorance of how students learn. Essentially, 
the automaton pragmatically discovers how students learn by trying 
to get students to perform specified tasks. 

Pask fails to mention two important criteria in his description 
of the self-organizing teacher. He does not hint at what would con- 
stitute an optimum procedure for trying different strategies , nor does 
he specify the criterion for determining what constitutes a satisfactory 
relationship between automaton and student. 

The machines which Pask’ s associates have actually built are 
very cleverly designed training devices, but they do not incorporate 
the self -organizing concepts suggested above. Rather, they are 
adaptive at the s am e level as the computer -controlled devices men- 
tioned on earlier pages; i. e. , they adjust the difficulty level of the 
instructional material to the performance level of the individual stu- 
dent. One of the earliest adaptive devices developed by Pask’ s group 
was for radar operator training, [4] but the best known device is the 
Solartron Automatic Keyboard Instructor (SAKI) for training operators 

of keypunch machines. "SAKI" demonstrates that, at least for special 

* 

purpose teaching situations, certain decision functions can be per- 
formed by compact electronic devices far less complex than the 
digital computers employed by other research groups. 

A student using M SAKI M views an exercise line consisting of 
alpha-numeric characters which are illuminated one at a time, each 
for a different length of time . Simultaneously, the student attempts 
to replicate the characters by depressing the keys on a key-punch 
machine . A separately illuminated display of the keyboard layout 
indicates to the student the correct key to depress at the same time 
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that a particular exercise character is being illuminated. This help- 
ful information may be withheld, either completely or partially. If 
completely withheld, the keyboard layout display lamps are not 
illuminated; if partially withheld, these lamps are illuminated after 
a delay period, i.e. , some milliseconds after the exercise character 
has been illuminated. 

Unfortunately, the published article [5] which describes the 
mathematical model of ”SAKI M has a number of errors and ambigui- 
ties which make a meaningful description of the internal mechanisms 
of the device impossible. These errors are discussed in Appendix A. 

One encounters similar inconsistencies in later papers by 

Pask. However, of more serious consequence is Pask* s use of a 

probability decision process in his adaptive systems. Every time (t) 

that a teaching routine must be selected from a set of available 

routines, a calculation is made for each teaching routine of the 

probability, P.(t), that the j-th routine will yield good results. The 

3 




from prior use of the j-th teaching routine. The probability of the 
selection of a particular routine is proportional to the P.(t). This is 

J 

in essence a Monte Carlo sampling mechanism, and it can be demon- 
strated that for stationary rules for mapping the p. into P ., as t-*«>, 

J J 

the average system pay-off will asymptotically approach 



An obviously better procedure than that suggested by Pask would be 

one where the average system pay off asymptotically approached the 

supremum of the means of the p. . Such a procedure will be discuss* 

J 

in Section III. 




Most recently (July, 1962) the M.I. T. Press published a book, 
A Decision Structure for T eaching Machines, based on the Ph. D. 
dissertation of Richard D. Smallwood [6] (Electrical Engineering 
Department, M. I. T. ). Before outlining his decision structure, 
Smallwood makes some rather strong assumptions. 

1. It is possible to specify a matrix of blocks of instructional 
information, where rows represent the logical sequence of 
concepts and columns represent alternate forms of informa- 
tion within each row. 

2. The probability that a student will respond correctly to a 
given block is equal to the fraction of students who have 
previously responded correctly on that block, regardless 
of the previous histories of learning experiences of the 
students . 

3. Even though a "logical 11 ordering of blocks must exist, the 
probability of responding correctly on a block is considered 
to be independent of the sequence of blocks which were 
previously seen by the student and independent of his score 
in those blocks. 

Smallwood makes other assumptions about the validity of 
certain theories of learning (reinforcement, self-pacing, small item 
size, etc. ) which are not really essential for the development of his 
decision structure and only serve to limit the applicability of that 
decision structure. 

The object of the decision process is to select which one of 
the instructional blocks from the matrix of possible blocks of inform- 
ation to present next to a given student. 

The decision process has as its criterion: maximize the 
individual student* s expected score until this score is above a 
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(arbitrarily) specified minimum level; thereafter minimize the stu- 
dent 's expected time to finish the course. The decision as to which 
block of information a student would be shown next was made as 
follows : 

1. Toss a coin. If ’’heads", assign student to block for which ^ 

the average score of previous students' responses was 

highest (or time was lowest, depending on which part of 
the compound criterion is governing the process at that 
instant). 

2. If coin toss comes out "tails", assign student to block which 
has been given to previous students the least number of times. 

Smallwood also suggests an alternate decision process, namely, 
that confidence intervals on the parameters determining the average 
score for each block be estimated, and when "too great a difference 
in the confidence intervals" for the different blocks exists, that the 
block with the largest confidence interval on the average score be 
selected. 

Neither of the decision processes given above actually meets 

# 

the stated criterion. The arbitrary choice of coin tossing to determine 
when to use the "maximizing" rule and when to use the "information 
gathering rule " is obviously non-optimal. Also, choosing the block 
with the largest confidence interval ignores the fact that this block 
may also have one of the smaller average scores. Thus, in both 
schemes, the process may choose blocks which result in sub- 
maximum scores with unnecessary frequency. 

Furthermore, there is a contradiction between the criterion 
and the reasons given for using the particular decision processes. If 
the criterion is to maximize a specific individual student's expected 
score, then one should always assign this student to the block which 




has the highest average score of previous students' responses (this is 
similar to the decision process recommended by Bradt, Johnson & 
Karlin [7] for the two-armed bandit problem where there is but one 
play remaining). The implication one draws from the use of a 
’’forced” choice (a non-maximizing choice) is that information gained 
from such ’’forced" choices will be of use in selecting the expected 
maximum block for later students. Therefore, the decision process 
does not adhere to the stated criterion of maximizing a particular 
student' s expected score but rather implies that the criterion is to 
maximize the sum of all students' scores, i.e., maximize 

S n = l X + 2 X + -" + n X ' " + N X 
This point will be the key to the next section. 



SECTION III 



CRITERION AND DECISION RULES 
FOR AN ADAPTIVE SYSTEM 

A. Criterion Function 

Some confusion in discussions on adaptive systems could 
possibly be avoided if everyone took pains to describe the level or 
levels of adaptive behavior involved in each system. All of the 
devices described in the preceding section are called "adaptive 
devices" by their creators, but the level of adaptivity is not the same 
in all cases. For educational systems (regardless of size) the follow- 
ing levels of adaptive behavior are defined: 

Zero Level Adaptive Behavior : A fixed, preconceived strategy (or 
pedagogy) is used for presenting to all students a fixed, preconceived 
set of courses or list of subject matter. 

First Level Adaptive Behavior : A fixed strategy which uses an 
individual student' s past history of performance to determine which 
particular course or list of subject matter from a preconceived set 
of such courses or subject matter is shown to that individual student. 
Second Level Adaptive Behavior : The particular courses or list of 
subject matter which is shown to a particular student is determined 
by a fixed strategy which uses an individual student' s past history of 
performance and the history of performance of all students who have 
previously gone through the system. 

Third Level Adaptive Behavior : A set of strategies for presenting 
students with courses or lists of subject matter is available. The 
choice of a particular strategy for a particular student depends on 
the history of performance for each of the strategies. 

(Separation of strategies and courses or lists of subject 
matter is a verbal convenience. Lists of subject matter 
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could just as readily be considered sub-sets of strategies, in 
which case the ideogram is simplified. This also eliminates 
the distinction between second and third level adaptive 
behavior. Hereafter, use of the word 1 strategy will imply 
both the pedagogical technique and the subject matter 
employed by the pedagogical technique . ) 

The zero and first level adaptive systems do not include pro- 
visions for data gathering or experimentation within the system. 

These systems are non-optimizing and their success is largely 
dependent on the subjective choice of the strategy. 

With the exception of Smallwood’ s system, all of the computer- 
controlled micro -micro systems described in Section II fall into the 
zero or first level of adaptive behavior, even though it can be shown 
that elaborate data processing equipment need be used for such 
systems [8], [9]. 

Systems with higher levels of adaptive behavior must include 
provisions for storing information on students’ performance and for 
experimenting, i.e., trying different strategies . In such systems 
students are simultaneously learners and ’’experimental subjects”, 
and the traditional experimental approach of ignoring the effects on 
students who have been exposed to sub-optimal regimens should not 
be tolerated. It is this consideration which leads to the choice of the 
criterion: Maximize S n , the sum of all stu dents’ net output . This 
criterion becomes increasingly important where changes in strategy 
(pedagogical techniques and/or subject matter) occur relatively fre- 
quently, so that the total number of students who could possibly be 
exposed to a given set of strategies is relatively small. Conversely, 
this criterion is needed for systems in which frequent change 
(hopefully towards the ’’better”) is a desirable feature. 
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For second or higher level adaptive systems, the criterion 
stated above is equally applicable to micro -micro, micro, macro, 
and macro -macro systems. Therefore, in exploring the possible 
decision rules or procedures which could meet the stated criterion, 
the problem will be treated in a general way and no mention will be 
made of the size of the system. Later, when considering the problem 
of collecting data for systems which use the stated criterion, the size 
of the system will again be of some consequence, and systems of 
different size will have to be treated separately. 

B. Sequential Decision Rules 

For the general situation (independent of system's size) let 

X be a collection of random variables defined on a probability 
n ij 

space 3. n X. . may be thought of as the random quantity that repre- 
sents the n-th drawing in a sequence of drawings from a set of popu- 
lations, 7T, , . . . , ff., . . . , 7 r, where the subscript "i" indicates 

1 z J K 

the i-th drawing from the j-th population. The 7T. populations are 

J 

specified by their cumulative distribution functions, F.(x). It is 

J 

assumed that these functions of the random variables have expecta- 
tions or means. 



In the application to an educational system, the set of populations 
could represent different pedagogical procedures or different sequences 
of learning items, such as the blocks used by Smallwood. The random 
variable is considered, in some mysterious way, to represent the net 
return attributable to bringing together the n-th student and the j-th 
experience. Later, in Sections IV and V, an attempt will be made to 
unravel the mystery of how one finds X from such measurable 
descriptions as student learning time, teacher* s time, student test 




k OO 



scores, system capital costs, etc. It should suffice here to hint that 
X is likely to be a complicated functional of functions of random 
variables and that due consideration will have to be given to the 
stability of any decision process proposed for use in a real educational 
system; i.e. , the decision process should preferably be one which 
guarantees that the error in the answer is no worse than the errors in 
the initial data, and conversely, one should not expect the solutions to 
have an error magnitude less than the errors in the initial data. 

One more clarification is necessary at this point. The n stu- 
dents represent a set from a population of students. It is assumed 
that there is an isomorphic mapping from the set of the n available 
students to each of the population distributions and that each trans- 
formation is independent (though not necessarily dissimilar) from the 
others. Note that the "mapping” is from students to measures on the 
students, and the measures include all information on prior states of 
the students. That is, the X represents net returns or, if you will, 
a utility of the increase in performance ability as a result of being 
exposed to a particular educational experience (the transform). 

Before considering adaptive decision procedures for maxi- 
mizing S n , some boundaries must be placed on the problem. Adaptive 
decision procedures will only be considered for the case where one 

desires to maximize S for an a priori set of possible strategies. In 

n 

this scheme, non-contender strategies (i.e., those strategies with 
little chance of being selected as the "best" strategy) can be eliminated 
prior to the termination of the process, but new strategies can only be 
introduced for consideration before the process begins. Whenever a 
new contender comes to light, the problem is terminated and a new 
problem initiated. The same adaptive decision procedure may be 
used for the first and second problems, though it is more likely that 
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a different decision procedure would be used for each, since more 
a priori information would be available for the second than for the 
first problem . 

Herbert Robbins [10] in 1952 first focused attention on the 

problem of how to draw a sample ^X, gX, . . . ^X from two populations 

if the object is to achieve the greatest possible expected value of the 

sum S = X + X + • • • + X. Robbins indicated that this problem 
n 1 2 n 

fits into the general context of sequential design of experiments, in 
which the size and composition of samples are not fixed in advance 
but are themselves functions of the observations, and as such, was 
the outgrowth of earlier work by Dodge and Romig [11] in double sam- 
pling inspection methods, and Wald* s [12] theory of sequential design. 

The available a priori information plays a most important 
role in the selection of a decision procedure, and some a priori 
knowledge conditions will be outlined below. 

First, there is the "maximum ignorance’ 1 case, where there 
is no a priori knowledge of the distributions of the X., the relative 

J 

magnitude of the nor of the total number of students (max n * N) 
available prior to the termination of the process. Sub-classifications 
of this case occur for n-*°°, and when "nature” can call a halt to the 
process at any n. Variations of this case occur for the process 
terminating at: N, a known constant; at N, given a known probability 
distribution on N; at ^ (n). 

Secondly, a priori knowledge may exist on the distributions 
of the X.. The distributions may be binomial, gaussian, etc. It is 
conceivable that for a given problem some of the X^ will have one 
distribution and others of the X_. will have another distribution. The 
same sub-classifications given for the "maximum ignorance” case 
hold here too, namely; n-*°°, N = unknown constant, N = known constant. 
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and stopping at &(n). Further sub -classification can be made for 
existence or non-existence of a priori estimates of the population 
parameters. In all of the above cases, the sampling process could 
be continuous until the end of the process, or, where the cost of 
making observations on samples is of consequence, the sampling 
could terminate before the end of the process. Some of the possible 
cases, separated according to the classifications given above, are 
shown in Figure 1 . Those cases which will be discussed in more 
detail below are also indicated in Figure 1 . 

Case I A i. For the case of maximum ignorance, where the only 
thing known is that the distributions in 3 have finite means, no 
decision rule can be specified which will ensure that the sum of the 
net values of the observations S n will be a maximum. However, if it 
is known that for each distribution there exists a second (or higher 
order) moment which is uniformly bounded, then C. L. Mallows and 
Herbert Robbins [13] suggest a decision rule which maximizes S R in 

the sense: 



where ix v is the mean of the population with the highest mean. 
K 

The reco mm ended decision rule entails the following: 




or 




a. Specify a sequence ...By., of disjoint monotonic 

sequences of integers, with B^ s ®jh' ^ a ' 
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CASES OF A PRIORI INFORMATION 
IN DECISION PROCESSES 
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B. n B.# = <f>; j* 3* (i. e. , the intersection of two sequences 
J J is the empty set) 



where 




and 



(b j2 - V < (b j3 ' b j2> < 



to which we add that, for convenience. 




b. If n e B, use decision D^: select the n-th observation 
from the j-th category. 

c. If n B, use decision D : select the n-th observation 
from the category which had the highest sample mean at the 
(n-l)th trial. 

An observation selected according to is called free, and one 
selected according to is called forced . 

Forced observations, made according to a predetermined 
sequence of inspection epochs, are required for the proof of 



and also satisfy the intuitive notion that some such procedure should 
be used to reduce the small but finite probability that the selection 
process becomes "trapped" in a category which does not have the 
maximum mean. This possibility of being "trapped" is readily 
illustrated in the following simple example: 





(selecting one observation from each category) 



that the n-th observation be made from the category which had the 
maximum sample mean at the (n-l)th trial. 

The expected value of before any observations are taken is: 

k N-k k 

e(s n ) ■ H + £ 2 w 

v iN 'o j=l 3 n=l 3=1 J J 

where the first sum on the right-hand side of the equation is the 

expected value of the first k observations — one from each category — 

and the second sum represents values from a branching tree, where 

at each junction point on the tree there exists a probability ( n p j) that 

one of the k categories will be selected. With no knowledge about 

the distributions, the P. cannot be estimated, and neither analytical 

n 3 

nor computational solution exists for this case . However, a simu- 
lation study of some possible decision rules is revealing. 

Although one of the conditions initially imposed on the decision 

rule for this case is that it should be usable for any set of populations 

for which the cumulative distribution functions have finite expectations, 

and there exist second (or higher) order absolute moments which are 

0 

uniformly bounded, sets of k normally distributed populations having 
equal variances and equal contrasts between the means were selected 
for convenience in the simulation. (The computational work was done 
on the University of California IBM 7090, IBM 1401 and IBM 1620 
computers.) 

Since the current study is a part of an old and continuing 
search for an appropriate framework for adaptive educational systems, 
the decision rules suggested by earlier experimenters were included 
in the simulation study. Admittedly, some of these rules could, 
under certain conditions, be eliminated from consideration by 
analytical methods. However, these rules were examined for three 



Then at n = 3, j = 2, and since 



E(X 2 ) = M 2 = 0> ! X n 

it is possible that using D Q no further observations will be made from 
the j = 1 category which has the larger mean. 

It was the concern over the possibility of "trapping , or as he 
put it: "the dangers . . . that the decision process may eliminate some 
of the alternatives from consideration because of lack of data on the 
consequence of the alternatives", that led Smallwood to use the coin- 
tossing analog to select forced and free decision rules. The drawback 
of Smallwood* s forcing rule is that no matter how much information is 
accumulated on the categories whose sample means X. < X R , the 
frequency of selecting from these j categories remains unchanged. 

Robbins’ B. is completely arbitrary, within the limits defined 
above for B., and for n-**> one set of B^ is just as good as another. 
However, the case of n-**> is not of particular interest within the con- 
text of the type of evolving adaptive educational system that has been 

suggested earlier. 

Case I B . No unique solution exists for this case . 

Case I C i . For the case of a finite N, convergence with probability 
one cannot be demonstrated every time the problem is run as m 
Case I A i. The best that can be expected of a decision rule for 
finite N is that the expected value of S n is maximum in some sense, 

i. e. , 

E ( S N,u 

lim r: 

u*-»oo Nu 

where u is the number of times the problem is repeated, m which 
case there is an intuitive appeal to the decision rule which requires 




reasons: 1) the rules could possibly be used wider conditions where 
they could not readily be eliminated by analytical methods; 2) even 
where analytical methods could theoretically be used, the analytical 
methods may be more cumbersome than thp empirical methods; 

3) some insight was desired on the magnitude of the difference result- 
ing from using the different decision rules, including the admittedly 
inferior rules. The possibility existed that a "good" rule might be so 
much more difficult to implement in the real world as not to justify 
its use, particularly if the "inferior" rule yielded results not too 
much below those of the "good" rule. 

The following sampling decision rules were tested: 

RULE 1. For n — k select one observation from each of the k cate- 
gories. For n > k select the n-th observation from that 
category which had the highest sample mean at n - 1. 

RULE 2. For n — k select one observation from each of the k cate- 
gories. For n > k flip an unbiased coin. If "heads", select 
the n-th observation from that category which had the highest 
sample mean at n - 1* If "tails", select the n-th observation 
from the category from which the least number of observations 

has been made * 

RULE 3. For n - k select one observation from each of the k cate- 
gories. For n> k flip an unbiased coin. If "heads", select 
the n-th observation from that category which had the highest 
sample mean at n - 1. If "tails", select the n-th observation 
from the category which had the highest product of the sample 
mean and the sample standard deviation at n - 1 . 

RULE 4. For n — k select one observation from each of the k cate- 
gories. For n > k select the n-th observation from the 



category which had the highest product of the sample mean and 
the sample standard deviation (s) at n - 1. 

RULE 5. For n — k select one observation from each of the k cate- 
gories. For n > k select the n-th observation such that the 
probability that the n-th observation will come from the j-th 
category is: 



RULE 6. For n ^ k select one observation from each of the k cate- 
gories. For n > k select the n-th observation such that the 
probability that the n-th observation will come from the j-th 
category is: 



RULE 7. For n - k select one observation from each of the k cate- 
gories. For n > k if n € IL, select the n-th observation from 
category 3. If n ^ B, select the n-th observation from the 
category which had the highest sample mean at n - 1. 

Rule 2 is Smallwood' s decision rule. Rule 4 is derived from 
an untested suggestion by Smallwood. Rule 3 is a mixture of Rules 2 
and 4. Rule 6 is Pask' s decision Rule. Rule 5 is a mixture of 
Smallwood's Rule 4 and Pask's Rule 6. Rule 7 is Robbins' decision' 
rule. Rule 1 is a simplification of Robbins' rule; i.e., it is the case 
where the set B is the empty set. 

Another rule: 



(n-l X i) (n-l 8 ) 




k 




RULE 8. For n ^ k select one observation from each of the k cate- 
gories. For n > k modify the forcing set B according to sub- 
rule Z; then if n e B., select the n-th observation from 

3 

category j. If n ^B, select the n-th observation from the 
category which had the highest sample mean at n - 1. 

Rule 8 was not tested because "sub-rule Z" could not be 
specified at this point in the investigations. It was hoped, however, 
that the simulation study would shed some light on possible sub -rules. 

For Rule 7, the following arbitrary forcing set B was specified: 



Category 


Set B, k = 4 


Set B, 


CO 

II 


Set B, 


k = 8 


A 


9 36 


9 


36 


9 


36 


B 


11 44 


10 


40 


10 


40 


C 


14 53 


11 


44 


11 


44 


D 


18 63 


14 


48 


12 


48 


E 




17 


53 


14 


50 


F 




18 


63 


15 


53 


G 








17 


58 


At each of the u repetitions 


of the problem, the numbers in 


each column were randomly scrambled. 


For example. 


for the second 



iteration of the problem with k = 4, the forcing set was: 

Category Set B 



A 


11 


53 


B 


9 


63 


C 


14 


44 


D 


18 


36 



The integers in each column of Set B were selected so that no 
matter what combination of integers randomly appeared in the first 
and second columns, adherence would be made to the restriction that: 
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( b j2 ’ b jl) < ( b j3 * b j 2 ) ; 



where 



bj i = 3 



A preliminary set of simulations was made to demonstrate how 

each rule behaved in individual iterations of the problem. An example 

is shown in Table 1, where Rule 1 was used with k = 4, Mj^ = 85, a = 10, 

and contrast of 10. In the first run, "trapping ' occurred in ffg. In 

the second run, all observations after the first k are taken from the 

category which has m^ In the third run, some switching between ir 3 

and it A occurs before all subsequent observations are taken from the 
4 

category which has Mj^* The results of these preliminary simulations 
should be borne in mind during all the subsequent discussions, which 
will deal exclusively with averages or expectations over many repeti- 
tions of the same problem. 

S 

The "Expected Values" of — were obtained from 500 iterations 

/Sn\ 

of the same problem. These E ) were obtained for values of N 
from 1 to 100 — a = 10, 20, 30; k = 4, 6, 8; contrasts of 5 and 10 — 
and are summarized in Table 2. Also shown for each combination of 

Sji 

N, k and contrast is the maximum expected — , i. e. , the value that 
would be obtained if the first k selections yielded numbers equal to 
/u , m 0 * . . . * Mi_ and the subsequent (N - k) selections all yielded 
numbers equal to Mg> 

Examination of Table 2 reveals that Rules 1 and 7 (derived 
from Robbins) yield consistently better results than Rule 2 (Smallwood) 
and Rule 6 (Pask), and also better results than the "mixed" Rules 3, 

4, and 5. For reasons that are fairly obvious, the results of Rule 3 
should approach the results of Rule 1, and the results of Rule 5 should 
approach the results of Rule 6. Rule 6 yields results which approach 
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TABLE I 

SIMULATION OF RULE I 



k = 4; Mj£ = 85; a = 10; contrasts = 10 



Run 1: 



Run 2: 



Run 3: 
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It is plainly useless to continue 
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the theoretical value 




considering the rules suggested by Smallwood and Pask for their 
adaptive systems. 



Focusing attention on Rules 1 and 7, it is observed that in the 
case of small contrasts and large a, there is a relatively high proba- 
bility that Rule 1 selects from sub -maximum categories in the early 
trials; therefore. Rule 7 shows up better than Rule 1. Where con- 
trasts are large and a is small. Rule 1 picks fewer sub -maximum 
categories than the number "forced” by Rule 7 . This suggests that 
the smaller the contrasts, and the larger the a, the more dense 
set B should be . 



Also, as the number of categories, k, increases. Rule 7 
yields lower results, since the number of "forced” selections in- 
creases directly as k increases, while the likelihood of finding the 
category with the maximum mean by the use of "forced" selection 
decreases with an increase in k. 



Case I C ii . If the cost of taking observations, 

set-up cost for a category. A, are considered, 

S before any obs< * >tions are taken is 
n 




C, and the initial 
the expected value of 




A. 

3 



where 

3=1 J 

If optimal stopping is permitted, say, at the n trial, where 
k < n* < N, and the remaining N - k - n* observations are taken 
from the category with the highest sample mean 




k n* k 



E 



(*»$, V n ?,2 (“ i ) W 

k k 

-2 n.*C. -2 A. 

j=l 3 3 j=l 3 



where 



k 

2 n*=n 

3=1 J 



* 



which is unsolvable for the same reasons as given in Case I C i. 

Again, these questions were explored by a computer simulation of the 

use of the various rules on specified normal "test* 1 , populations. Three 

s am pling costs were considered: no cost (wlore obviously one should 

never stop taking observations); a cost of one percent of the for all 

it., and a cost of ten percent of n . Four values of N were selected: 

3 . **■ 

N = n* (the sequential selection process stops and no students remain 
to assign to the category with the largest sample mean); N = 100; 

N = 1, 000; and N = 10, 000. Instead of having to compare each line of 



/ S \ yS n v 

E f -—^values with its maximum E J as was the case in Table 2, 
the results of the first k observations were excluded from the summa- 
tions (though not from the decision-making procedure) shown in 

Tables 3 and 4, with the result that the single standard of comparison 

( Sn «• k\ 

jpg- 1 , taken over five 

hundred iterations. 



A preliminary examination of Rules 2, 3, 4, 5, and 6 under 
the above conditions again showed that these rules yield lower results 
than do Rules 1 and 7. Rule 1, of course, is the same as Rule 7 with 
the set B as the empty set. The set B used in Rule 7 for computing 
the expected values of Table 2 can be considered a moderately dense 
set and was used again for the computations of Tables 3 and 4. 
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TABLE 4 

EXPECTED VALUES WITH OPTIMAL STOPPING 
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78.8 


76.3 


79.5 


78.7 


76.2 


H 

M 


d 


o 

in 


84.0 


84.3 


83.1 


83.7 


83.9 


82.8 


80.4 


80.9 


78.8 


80.1 


80.5 


78.8 






in 


84.0 


84.2 


82.4 


83.4 


83.7 


81.9 


80.7 


81.7 


79.7 


80.2 


81.2 


79.2 






o 

o 

H 


84.0 


84.2 


81.1 


83.2 


83.5 


81.0 


80.7 


81.8 


79.4 


80.0 


81.1 


78.7 
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A "full set" B is one with each integer present, resulting in all 
observations being taken according to D^, the "forcing" decision. 

Such a full set does not meet the restriction that 

( b j2 ' b jl ) < ( b j3 ‘ b j2 ) " ’ ’ ’ * 

but does present a convenient opposite to the other extreme case of 
an empty set. Also, use of such a full set is almost akin to those 
classical sequential sampling techniques which select one observation 
from each category prior to each decision step. 

The general conclusion from this simulation is that both the 
density of set B and the optimum stopping point n* are primarily 
dependent on the total number of students available, N, and less 
dependent on the size of k, a, contrasts, and sampling costs (where 
these are moderate percentages of /ujr). This conclusion can be 
inferred more readily from some graphs than from Tables 3 and 4; 
and Figures 2, 3 and 4 illustrate results for the typical case of k = 4, 
a = 10, and contrast = 10. From these figures it appears that the 
larger the N, the more dense should set B be, and the longer should 
one keep on sampling. If, however, N is determined by some 
decision process outside of the system -- i.e. , the experiment may 
be terminated at any n = NH — then Figure 5 shows that the empty 
set B is best. 



The question now arises: for Case I, if one starts with Rule 7 

and an a priori forcing set B., is it possible to modify this set as one 

3 

gains information on the tt.? Since the forcing set is introduced to 

3 

reduce the probability of being "trapped" in the wrong category, 

sample values are useless in determining what this set B. should be, 

3 

unless one wishes to make additional assumptions about the distribu- 
tions of the tj\ v A possible assumption is that all the 7J\ have the same 
distribution, only differing by the value of a parameter, say the 
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EXPECTED SUMS, NON-OPTIONAL STOPPING 

FIGURE 5 



means, /u.* With this assumption, a point could be reached where 
J 

sample values from the category with max n. could be used to estimate 

J 

the nature of the distributions. At this point, however, instead of 

changing set B. it would probably be advisable to shift the problem to 

3 

Case II, Case III, or other cases appropriate to the underlying 
distributions of the tt.. 

J 

Actually, it is hardly conceivable that Case I conditions could 
exist in any real educational system. Some Case II situations exist, 
but the preponderance of situations falls into Case III. Anticipating the 
results of later sections, it can be stated with a fair degree of cer- 
tainty that the X' s that will be used for the adaptive decision structure 
■will be approximately normally distributed. Why then consider Case I? 
The reason is that the decision rules used for Case I require relatively 
little computational work (or hardware), whereas Case III decision 
rules may require a tour de force in computation and analytical tech- 
niques or hardware that does not currently exist. It is partly for this 
reason that the empirical studies in Case I were made in normal 
distributions, for if decision rules derived for the non-parametric 
case yield results not too inferior to those obtained from the more 
difficult Case III decision rules, then there could be some practical 
advantages to using the simpler rules. 

Only brief mention will be made of the Case II problems, 

3 ince in complexity they fall between Case I and Case III, and tech- 
niques developed for Case III can be used with some simplification 
for Case II. 

Case II J3 . R. N. Bradt, S. M. Johnson, and S. Karlin [7] considered 
the special case of devising a sequential design which would maximize 
the expected value of the sum of n observations from two binomially 
distributed populations when the expected values of the distributions 



are known, though only an a priori probability is given to indicate 
which expected value is associated with which distribution. This 
special case was popularly called the "Two-Armed Bandit Problem 
from its similarity to a familiar gambling situation. 

R. Bellman [14] and M. Sakaguchi [15] couched the same 
special case in dynamic programming terminology. 

Walter Vogel [16] considered the same special case and further 
examined this problem with the additional restriction that k observa- 
tions are initially made on each of the two populations before the 
sequential sampling rule is employed [17]. 

Finally, Dorian Feldman [18] showed that for both a specified 
number, N, of observations and for an infinite number ox observations, 
the optimum (in the expected value sense) decision rule is to always 
select the n-th observation from that category for which the Bayesian 
posterior probability at n - 1 is greatest. 

Case III a C. Several approaches are available in the case where the 
7 T. are assumed to be normally distributed and differing only in the 
(unknown) value of Vy One approach, often suggested, will be 
excluded from consideration at the outset. This is the two -action 
sequential approach of determining which of k categories has the high- 
est mean and then assigning all remaining observations to that 
category. Bechhofer [19], Paulson [20], Fabian [21] and Dunnett [22] 
have made interesting contributions to this problem. In this approach, 
the problem of the trade-off between information gained from taking 
observations from categories with sample means less than max n X.. 

J 

and the loss in expected return from taking such observations is 

handled by requiring the experimenter to state before the process 

begins values of 6* and P*, where 6* is the smallest difference 

u -a that is worth detecting, and P* is the smallest acceptable 
^K-l 
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value of the probability of selecting when actually 6 • 



The difficulty with this approach is that when one wishes to maximize 
S , then 6* and P* are functions of the unknown and cannot be 
specified unless there are a priori measures on the M- Furthermore, 
this approach requires taking observations from each category at each 
trial n < n*, an obviously non-optimal procedure. 

There is another two -action sequential approach which also 

j(C 

requires taking observations from each category at each trial n < n 

^ 5j€ 5$6 

but which does not require a priori statements on 6 or P . 

Case III a 4 C ii. For X. . normally distributed with equal known 
variances, let n* be redefined as the number of trials, where each 
trial consists in taking one observation from each of the k populations, 
then ignoring set up and sampling costs the expected loss when making 



and draws on Girschick* s [24] earlier work indicating that sets of 

sample values X^, X^g, . . . , X^ n and Xg^, Xgg, • • • * ^2n y* e ^- n £ 
sample means X ± and X g can be identified with two population means 



k 

n* trials is n* Y, 6 , where 6. = M K ~ 

3=1 3 3 J 

stops at n* and the expected loss from ti 



3=1 3 3 J 

stops at n* and the expected loss from taking the remaining (N-kn ) 
observations from a v. * n K is (N-kn*) f, the total 



j "3 



If the sampling process 



observations from a T » . s* 7Tj, is (N-kn*) 

J x 




Maurice [23] considers the case where k = 2 and 




P a and as: 




X = 



&( x i*a) &( X 2*b) 



- where 



Kvj) = 



*/n 

<ja/27T 



exp 



n 

l’ 2 






2n 




L(7 



The sequential rule in this case is to continue sampling as long as 
B < X < A. Since a loss results whether 6 is positive or negative, 
B = l/A or sampling should continue as long as 



- <e * P 



n6 

2 

Lcr 




Taking the logarithm of this gives 

-a < ^(Xi-X 2 ) < a 



or 



- 2 < | i (x il -X i 2 )<fo- 



a 2 



The average expected sample number (ASN), designated 
here by n**, for assumed large N is: 

2 

n **_ cr a (exp[a]-l) 

6 2 (exp[a] + 1) 



p (1 ~ expf-al) _ 1 

* ~ (exp[a]-exp[-2]a) ” exp[aj + 1 



and 



&& sic 

Maurice then substitutes n for n and P for #P. in the expression 

® 3 

for E(L), maximizes with respect to 6, minimizes with respect to 
a/ 6, and finds a solution of the form: 

Dj! If 2 ( X u - X 12 ) > 0. 5842 -n/n a 
make all subsequent observations from 7T^. 

D 2 : H Z ( x u - X. 2 ) < - 0. 5842 *7 N a 

make all subsequent observations from 

D q : If - 0. 5842 «7 n c S £ ( X^- X. 2 ) £ 0. 5842 <7n a 

i=l 

take another set of k observations. 

However, in the current application, and in other industrial 
applications, another cost should be included, and that is the cost of 
taking observations. This cost has not been included in the formula- 
tions of Maurice and others, and is derived here in Case Ilia 4 C ii. 

Case III a 4 C ii .* The conditions for this case are the same as those 
for case III a 4 C ii, except there is the additional expected loss 
attributable to the cost of sampling, or taking observations. For the 
case of k = 2 

E(L) = n* 6 + n* C + (N - 2 n*) 6 ( *P.) j * K 

n j/. 

where C is the cost of taking observations on each pair of X_, X i2 . 

If C can be stated as a percentage of the 6, i.e. , C = p 6 

E(L) = n* 6 (1 + p) + «N - 2 n*) 6 < n *P ) 

and letting c = 1 + p 

E(L) = n* 6 c + (N - 2 n*) 6 ( #P.) 

n 3 

Following Maurice* s procedure, substitute n for n and P tor 

*P in the above expression, and let: 
n 3 




wrmm 









Z =- 

6 



. 6c<7 2 l(exp[l6]-l 

* * E(U ’ 6(exp[i6]+l) 



N 



_ 2<t l(exp[l6]-l 



6(exp[/6]+l) J |_(exp[^6]+l 
2 



2<r 2 ^ (exp[^6]-l^ 2 0i g . . 

F , T v _ N6 ±___l . cct l(exp[l6]-l) 

' ' exp[i6]+l / 2 (exp[i6]+l) 

^exp[<06]+l) 



To solve for Z\ 



2*2 



8E(L) _ N(exp[l6]+l-l6 exp[l6]) 2 <t l expfl6] 



86 



(exp[$6]+l)‘ 



(exp[^6]+l)‘ 



+ 4o 2 l 2 exp[l6] (exp[l6]-l) 2c(T 2 l 2 exp[l6] 
(exp[i6]+l) 3 (exp[^6]+l) 2 



Setting this equal to zero and substituting x = exp[^6], Zn x = Zb 



N _ 2Z x(3-x-c-cx) 
2 ~ (x+1 )(x+l -x Jn x) 



8E(L) _ _ N6 2 expfl6] _ 2cr 2 (expfii]-l) _ 2 ct 2 14 expflS] 
• (exp[i6]+l) 2 (exp[i6]+l) 2 (exp[i«]+l) 2 



4<t 2 14 exp[l61 (expf 161-1) c <r 2 (exp[l6]-l) 
(exp[i6]+l) 2 (exp[^4]+l) 



+ 2c a 16 exp[l6] _ q 
( exp[^6]+l) 2 



N _ - 2(x- 1 )(x+l )- 2xi nxtlx+l )+4xlnx(x- l)+c(x- 1 )(x+l ) +2cxlnx(x+l ) 



6 x (x+1 ) 



N 



Equating the — and simplifying yields 



x Zn x(4x-8)-cx Zn x(x+l)(x+3)-2(x-l)(x+l)+c(x-l)(x+l) = 0 



or, in terms of the percentage of 6 



4 x in x(x-2Ml+p)x in x(x+l)(x+3)-2(x-l)U+l)+(l+p)(x-l)(x+l) 2 = 0 

Solving this equation for different values of p, and substituting these 
back into the expression for the required solution for i is found 

<7 Z 

from 

o _ / (x+l)(x+l-x in x) 

a v 2x(2-2x-p-px) 



The decision rule now is: 
n 

D : If £(X -X. J> iWN make all subsequent observations from 7^ 

1 • «, 1 1 i2 

1=1 

n 

D : If ^ ^ X ii“ X i2^ < ”^ 0A ^" make all subsequent observations from 7 r 
i=l 

n 

D : If -iWN ^ i<WN take another pair of observations 

1 = 1 

Table 5 below gives the x and i solutions to the above equations for 
different values of p. 

TABLE 5 

TWO-STAGE SEQUENTIAL STOPPING CONSTANTS 



£ 


X 


i 


0.0 


9.061169 


0.584160 


0.2 


8.517213 


0.536543 


0.4 


8.148601 


0.498402 


0.6 


7.883984 


0. 467080 


0.8 


7.685562 


0.440822 


1.0 


7.531641 


0.418433 


1.2 


7.408969 


0. 399067 


1.4 


7.309016 


0.382113 


1.6 


7.226079 


0.367117 


1.8 


7.156191 


0.353734 


2.0 


7.096525 


0.341697 


2.2 


7.045010 


0.330798 


2.4 


7.000092 


0.320870 


2.6 


6.960585 


0.311777 


2.8 


6.925579 


0.303410 


3.0 


6.894347 


0. 295677 


3.2 


6,866313 


0. 288504 


3.4 


6.841013 


0. 281825 


3.6 


6.818063 


0. 275587 


3.8 


6.797157 


0. 269744 


4.0 


S. 778032 


0. 264256 


4.2 


6.760470 


0. 259088 


4.4 


6.744288 


0.254211 


4.6 


6.729332 


0. 249599 


4.8 


6.715466 


0. 245228 


5.0 


6.702573 


0.241078 






Case III a 3 C i. A case which is of particular importance in educa- 
tional (and other) systems arises when, by the nature of the process 
involved, observations will be made on each of the available N stud- 
ents (or experimental subjects, Ss. ) and the k populations are known 
to be independently normally distributed. In this case, the analytical 
solution for the problem of maximizing S N involves evaluating a 
(k - 1) - multinormal distribution, tabulated values of which are not 
available for k > 3 . However, stating the problem in recursive 
form a numerical solution is feasible. Such a solution, using a 
backwards -induction technique, is developed here. 

In this case, the ir^, ffg, . . . , IT j, • • • * populations are all 
independently normally distributed with random variables x^, known 
variances a., and unknown means Vy Let m be the number of obser- 
vations from 7T. at the n-th stage. Therefore, n=n 1 +n 2 +. . • +n j +# • • +n i i 
Let n = N - n be the number of observations remaining after the n-th 
observation has been made, and S#s» is the sum of the remaining 
observations . A k-dimensional decision tree can be imagined where 
each branch point in the tree is identified by the k- tuple. 



been made, there will exist a k-tuple of sample means 

( X, , X n , . . . X. 5L> corresponding to that (n , n* . . . ,n. . . , n 

n 1 n 2 n j n k a * J 

branch point actually obtained at the n-th stage . The sample means 

c an be just as readily identified by the number of stages remaining. 



optimality in dynamic programming [ 26] would indicate for this case 
that an optimal decision rule is one which maximizes the sum of the 
remaining observations, regardless of what path or what decision 
rules one followed in arriving at the two state k-tuples» Therefore, 




n ), corresponding to the number of previous 



observations taken from each tt.. 



each it.. Also, after n observations have 




or instead of X., ^,X. can be used. Given (n ,n , . . . ,n., . . . , 1 ^) 

T1 1 T1 1 A Ct J 



9 • • • 9 



( ~X„ ~X, ~X.) at each stage, the principle of 

n 1 n 2 n i n k 



•9 ••• 9 



the problem can be restated as one where must be maximized at 
each stage, where: 

[Sjrl<V n 2 n j' ' ' ' ' n k > ’ ( n X l'n X 2 n X J* ’ * •'n X k ) J 



ff-l x+ ff-2 x+, " + o X 



and the expected value of S^, is defined as: 

E[S~] |( n 1 » n 2> . . . , ry . • • » ff X 2* . • •• 

= E [^n-l X + fi'-2 x+ " , + o x ^ D ] 
sE [n-l X l D ] +E [ S n-ll D ] 

where the decision rule D is: Select the (n + l)st = (n - l)st observa- 
tion from the tt. which has the maximum expected value of the remain- 
ing observations. This can be expressed by the recurrence relation- 



ship: 



*** ** / 



E[S^] = max 
n 3 



E[< 



•1 X 1 1+E 1 


|_ S n-l l <n l +1 ' n 2' 


• • • $ n*i • • • $ 
J 

-1 


V* 


wan 

(*J* 1^1* cp^o* * 
n- l l n & 


. . a •* • 

n 3 


• • ’n X k^' D J 




1 x 2 1+E 2 1 


| S n-ll (n X' n 2 +1 * 


• • • i n*i • • • $ 


V' 


• 


ff-l x 2* * 


• . a mX • , a 

n 3 




1 


• 

,x.] +E.| 
1 3 3 l 


>l |(n 


l ,n 2 # 


$ n.+ 1# • • • # 
3 


V* 


<~x,» 

n 1 

• 


fi^2' * * * 


• /V 4 X • 

n-1 3 




1 


• 

A 1 +E k 


[ S n'-> 


1* n 2* # # # 


» • • • » ^ 
mm 




% x i' 


n^2' * * 1 


i j • s • • 

n 3 


•n-lV' 0 ] 





Using the implicit assumption of Raiffa and Schlaifer [ 25] that 
unknown population means be treated as a Gaussian distriL ited random 



variable with mean of X. and variance of a. In., i.e., 

O ”1 n 3 3 3 

? jl n X j* CT j /n j J * then: 

E[S|v] “ max | ^X.+E . [S^v jjj J (n^i iig* • • • $ • • • # 

3 

(r%/X 1 ,<OC 9 , . . . -x , < . .r*X, ), d} for j = 1 , 2, • • • , k 
n l n <s n-i j n k ' 

= max{~X. +E j [S~_ 1 ]} 

J 

where the new mean is given by 

n.(<%»X.)+«^» -X. 

- = 3n 3 n-l_i 

n-1 j n. +1 

J 

and is Gaussian distributed with mean and variance 

2 2 

G. G. „ 

-£7 = o’j/n.Cn.+ l) 

n. n.+ l 3 3 3 

3 3 



To solve the problem, the following backwards -induction 
procedure is employed: 

(a) Start at the end of the decision tree, where n^ + n^. . . +n.+. • • 
+n, = n = N, and therefore n = 0. At n = 0, E[SjJ = 0, and 
there is no decision to make since no observations remain to 
be made. 

(b) Move back down the decision tree to the (n^, n , n., . . . , n^) 
state points located at n = 1 . 

At each of these state points 

E[Sjl = max{~Xj + E.[S^} = max{~X.} 

If one were moving forward along a path in the decision tree, 
then at n=l the (n^,n^, . ..,n., .. • * n k^^r^l , n X 2 ,, ** , n X j ,, ** , n X k^ 



would be known and it would be a simple matter to select 
max|^X_. j. . However, in the backwards -induction the 

( r*i'T*2' • • * are not known ' ^ therefore 

the exhaustive procedure of considering all possible combina- 
tions of T X. ^X k ) will be used. If each 

is examined at q discrete points in the range between - « 

T j 

and + then at each (n^, n ^ • • • n j* • • • * n ]J state P°i nt there 

are q^ cells, arranged in a k-dimensional array. Each cell 

corresponds to one of the possible discrete combinations of 

<4L,*5L. . . . ,~X., . . . ,~X,> and in each cell can insert the 
1112 1 j IK ^ 

value of the maximum of the means corresponding to that ceil. 

Also, one can record the identification of the population 

associated with the maximum mean for each cell. Therefore 

D is exhaustively determined for each possible combination of 

(jXl,~X 2 ~X., . . ^Xjj) at each possible (n^ 

Move back down the decision tree to the state points at n=2. 
Here, 

E[Sg} = + E^Sjj} 

3 

k /w 

Since values for E[S have been stored in the q cells at n=l, 

the E [ S^l is computed from the sum of the products of the 
j l l 

stored values with its probability of occurrence. The distribu- 
tion of the is g[~_ ^ 1-^, a 2 /n.(n.+ 1) and in order to 
find probability weightings for each of the q discrete points 
that the distribution range has been divided into, a quadrature 
based on a Hermite polynomial approximation to the integrand 
can be used, such that 

q q 

rexpl-zMexptz 2 ]^)}^ ' <«.)esp[z W.&<z.> 

J_oo i=l i = l 
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where z., a. and W. are tabled [27 ] for quadratures up to 
1 1 1 

q = 20. 

(d) Step (c) is repeated for n = 3, 4, . . . ,N-k; where N-k corre- 
sponds to the starting state point (1> lj • • • , 1# • • • » 1 ) • 

The solution then consists of q cells at each state point in 
the decision tree, each cell corresponding to one of the possible 
combinations of ( X^,X . . . ,X., . . .X^)* an ^ i* 1 eac ^ ce ^ i® telling 
wliich population to take the next instruction from. While a solution 
as outlined above is feasible, the computational time and output 
storage requirements are excessive. For example, to store the 
D in each of the cells requires 

k l r 

jjy n (N-l+a) 

* a=l 

storage locations. The number of required storage locations can be 
reduced by observing that: 

i. Initially, one observation is taken from each ir 

ii. At points in the decision tree where an equal number of 
observations have been taken from each population, and 
at the n = 1 stage, the next observation will be taken from 
the population having the largest sample mean. 

This reduces the number of storage locations to 

k -i f 1* * or N even 

qk [ b • 1 + F! a n 4 (N " k ' 2+a) J : Where b Hi. 5. for N odd 

For equal population variances the number of storage loca- 
tions can be further reduced by a factor of k. Nevertheless, for 
example, for k = 3, q = 16, N = 500, and equal variances, a minimum 
of 4. 19 x 10 10 storage locations are required! 



It is possible to make a significant reduction in output storage 
space requirements and in computation time by the reparameteriza- 
tion described below: 

Define a set of superscripts (a, b, . . . , h, . . . , i) such that: 
x a > X b > • ' r > X h > * * ' > X 1 

At each n-th stage reassign the set of superscripts to the 

7 T , x and a.. Therefore, a given superscript need not be associated 
3 3 3 

with the same subscript f rom stage to stage . Also define 




where 



A. = 

3 



0, for 7T . = 77 
3 



x a -x. 



(7 



i , for ir. * ir* 
a 3 



For the sake of simplicity, the case of k = 2 will be used in 

SL SL 2L 

the following exposition. Therefore 77 , 0 ", and n will correspond 

b b b 

to the larger sample mean X a , and ir , <j and n will correspond to 

u 

the smaller sample mean X . In terms of the new variables; 
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It is possible to make a significant reduction in output storage 
space requirements and in computation time by the reparameteriza- 
tion described below: 

Define a set of superscripts (a, b, . . . , h, . . . , i) such that: 

X a > X b > • • • > x h > • • # > X 1 

At each n-th stage reassign the set of superscripts to the 

U., X. and < 7 .. Therefore, a given superscript need not be associated 
^ 3 J 

with the same subscript from stage to stage . Also define 



U n “j^n l <n l* n 2' • • • ' n j V' ( n^l' 5*2' * 1 1 ' n X j' * 1 * ' n X kX 






. fn** ■'s 4i X i +E .i [S n-l 1 ) ) 

■ ■ % = t r > / 



= mm 
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For the sake of simplicity, the case of k = 2 will be used in 
the following exposition. Therefore ir a , a a , and n a will correspond 
to the larger sample mean X a , and ir D , or and n will correspond to 
the smaller sample mean X . In terms of the new variables: 




U~ I (A, n a , n b , a) = min \ 
n h L 



A h 1 
A n-1 
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where 



A h = 



0, for h = a 
_ V b 

A = — , for h = b 



To get a recursion in terms of U, expand the expression inside 
the ( ) brackets: 




For h = a, two cases can arise; either ^x results in a-^X which 
comes from the same population as ~X a , or ~_^x results in a 
which comes from a different population than~X a . For the former 
case, the first quantity inside the ( ) brackets is equal to: 




In the latter case this quantity is equal to: 
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a (n a +l) 




~x a -^X a 

transforming both expressions by y 

(ja. 



and combining terms: 



- J yG^y|o, l/n a (n a +l)jdy + A J G y|o, 1 /n a (n a + 1 ) J dy 

= -J y G^y |o, l/n a (n a +l)Jdy + 

-A 

A (I G [y l°* 1/na < n3+1) ] d y - 1 . G [ y l°* 1 /“ a (» a + i )]dy^ 



= A^* (y + A) G^y|0, l/n a (n a +1) Jdy 
transforming by 6 = y + A gives: 

A-^* 6g£ 6|A* l/n a (n a +1) Jd6 

In the two cases described above, the second quantity in the ( ) 
brackets is equal to: 

J [ U n-1 ( 6 l na+1 » nb )] G [ 6 l A » l/n a (n a +l)Jd« + 

~a i* |n' > » na+ l)jG^ 4 |-A, l./n a (n a +l) jd4 

Similarly, for h = b, under the two cases, the first quantity in the 
( ) brackets is equal to: 




and transforming now by 6 = y - A gives: 



- 6 G [ 6 |-A, f-UnV+l) ] d 6 



The second quantity in the brackets is equal to: 

4 fI U n-l ( 5 |n b +l.n a )]o[i |-A, ^/nV+l^d 6 
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n^(n^+l)J d 6 



Collecting terms: 




An attempt was made to numerically solve the above integrals 
by using a Gaussian quadrature of the following type: 

C l^zjdz = 2 "i |r< z i> 

J -1 u i=l 1 



where z. and ^ is tabled [28] for q = 1 to 48. Since the limits on the 
integrals in are from 0 to 00 and not from -1 to +1, the following 
transformation was employed. 
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ii— ■ with Jacobian d6 
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dz 



This quadrature approximation of the integral is not accurate 
for value of A approaching °° . At first it was felt that this would not 
be of serious consequence, since for very large A one would always 
select the population which contained gX . However, the error is 
multiplicative as one steps back through the N iterations of the recur- 
sion formula, and significant errors occur. This problem was over- 
come by using an approximately exponential grid spacing for the A, 
and at each A grid point approximating the integral by using tabled 
Gaussian probability values associated with seventeen equally spaced 
abscissa points in the range of -3. 2 a to +3. 2 a. 

a . b 

The results confirm the intuitive notion that for n > n one 
should always select the next observation from ?r a . For n a ^ n there 
will exist a range of A, from A = 0 to a critical A, n A c , for which 
a choice from ir b has a smaller expected U~ than does a choice from 
2 r a . Therefore, on the decision tree one can associate with each 
(n a ,n b ) branch point a A . Having once determined the A q for all 
branch points in the decision tree, the experimenter merely calcu- 
lates the actual "A ** obtained in his experiment at a particular 
/ n a n ^) and compares the tabled A at that branch point with his A^. 

If A < A the next observation is taken from ?r . If A^ - A q , the 
E c a 

next observation is taken from v . 

It is possible to show the results by a "topographic" map of 

the critical A surface, as illustrated in Figure 6. At present, one 
c 

such map is required for each value of N. It remains to be seen 
whether some simple transformation of scales, in terms of N, can be 
used to obtain the A c surface from one generalized map. The map, 
of course, can only be drawn for k = 2. For larger k, the values of 
A^ can be tabled or stored on magnetic tape. 



The critical A c surface is shown for k = 2, a a = or b , N = 58. 

To use the figure locate the grid point corresponding to the 
number of observations, n a , previously taken from the 
category with the larger current sample mean and 
the number of observations, n b , from the 

category with the smaller current sample ?£ 

mean. At this grid point, determine A c ^ 

by interpolation between the contour lines 
If the observed Ajj (defined below) is 
less than A c , select the next 
observation from the 
category with the 
smaller sample 
mean. 




If the 

observed Ag 
is equal to or larger 
than A c , select the next 
observation from the category 
with the larger sample mean. 



oh ha - Larger sample mean - Smaller sample mean 
serve Ag - standard Deviation of Category with larger 

sample mean 



DECISION SURFACE 
FIGURE 6 
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Appendix B contains the flow diagram for the computer 
program used in the above solution. In its current form, the program 
requires an average of one-quarter of a second (on the IBM 7090) for 
the computation of each A c< The program allows one to find a solution 
for any initial starting point on the decision tree. For example, an 
experimenter may have prior information on n^ observations from 
it and n. observations from n 0 before he decides to use the backwards* 
induction solution to determine an optimal path through the remainder 
of the decision tree. Also, the program accommodates problems in 
which the population variances are known but not equal. An interesting 
extension of the backwards -induction technique described above would 
be for the case where the variances are unknown. 

In the application of the backwards -induction solution 
discussed above, the only thing of interest was the A . However, in 
other applications, the value of E[S~] is required, and therefore 
these values are also made available by the computer program. 

It is conceivable that a library of solutions for different N and 
k can be obtained with this computer program. However, before any 
large scale project of this nature is undertaken, consideration should 
be given to the use of a hybrid analog-digital computer for the calcu- 
lation of the numerous integrals encountered in this problem . 

l^e final comment on this section is that even though some 
interesting decision rules have been developed for maximizing the sum 
of the net values associated with observations from k categories, 
considerable further work can be done in extending and generalizing 
both the two -stage and multi-stage sequential sampling plans. For 
specified N and k and A (or 6) it can be demonstrated that one or the 
other of the various decision rules discussed in this section yields the 
highest E[Sg*] . However, the differences are not always large, and 



the significance of the difference between E[S^] obtainable with 
different decision rules cannot be evaluated without considering the 
precision of the basic data and the utility function employed in con- 
verting these data into ”X M values. Therefore, it is now time to 
examine the hitherto mysterious "X" quantities used in this and the 
preceding sections. 



SECTION IV 



A UTILITY FUNCTION FOR THE OUTPUT 
OF EDUCATIONAL SYSTEMS" ”” 

Up to this point, it has been suggested that in a situation where 
students are being "educated” and simultaneously being used as 
"experimental subjects", one should follow a decision rule which 
tends to maximize the net output of all students going through the 
system, that is, maximize 

S = ,X + 0 X + •••+ X 
n 1 2 n 

Some decision rules which tend to give maximum S n under 
different conditions of a priori knowledge were also suggested. How- 
ever, the "net output", n X, has remained ambiguous. This n X can be 
prescribed for different sets of conditions, some of which are given 

below. 

A. Minimum Conditions 

i. A nominally described teaching-learning program, 
ii. A numerically scaled student performance measure, where 
equal distances on the scale have equal "value" and one end of 
the scale has "higher value" than the other end of the scale 
* (a binary scale is permissible). 

The number obtained for each student from the measure 
described in ii is the X for use in the decision rule. 

The minimum conditions given above are typical of almost all 
currently reported educational experiments, where no attempt is 
made to specify the relationship between costs of education or the 
value of the subsequent life -productivity of the student and the school 
performance measure. 
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These minimum conditions may suffice for making decisions 
on micro -aspects of an educational system, but if one' s actions are 
to make sense when judged from outside of the system, then the 
system inputs and outputs must be defined in value units which have 
currency outside the system. This is not a new problem, but one that 
has continuously plagued educators and has long been considered of 
fundamental importance. The views have often been despairing. For 
example, M. L. Jackson [29] noted the similarity between some engi- 
neering and some educational processes. He suggested that "the 
student is our 'product' in the manufacturing process of education. 

The raw material varies, sometimes in an uncontrollable manner. 
Classroom instruction is the process whereby the product is formed 
and this phase is of overall importance. The final product cannot be 
evaluated except after a number of years, and in most cases the feed- 
back is obtained too late, or not at all". If what Jackson says is true, 
then very little meaningful analysis of such an educational process is 
feasible . If a current value for the output of the educational process 
stated in the same dimensions as the value of the inputs cannot be 
found, and if differences in the output cannot be related to specific 
differences in the transform, then the problem can only be resolved by 
insight and intuition. 

The problem can be illustrated by a simple example; given the 
following data: 

Method A 

Average Final Examination Score 80 

Average Learning Time 9 months 

Cost per Student $1200 

and the statement that differences between the average examination 
scores and the average learning times for the two methods are statis- 
tically significant, how does one determine which method to adopt? 



Method B 
90 

6 months 
$2500 



How does one evaluate whether or not an increase in examination 
score of 10 points is worth an increase in cost per student of $1300? 
Or what ’’value” should be assigned to the three months* saving in 
learning time possible with Method B? Is one justified in using some 
combined measure of score and time, such as the commonly suggested 
final score divided by learning time ? Why not use final score divided 
by the logarithm of learning time, or any other arbitrary weighting? 

Partial answers to some of the aspects of this problem are 
found in the recent literature on the measurement of educational 
system outputs. Jones [30] used a rating of the individual graduate* s 
subsequent ’’success” as evaluated by his peers and also the grad- 
uate *s self-rating of satisfaction and achievement. Jones also 
attempted to obtain evaluations (from teachers of the graduates) on the 
contributions to society made by the individual graduate, and also on 
how these contributions compared with the teacher* s subjective 
opinion of the potential capabilities of the graduate. However, there 
is some question as to the validity and reliability of the above meas- 
ures. 

Many investigators use life -cycle earnings of students as the 
measure which is (somehow or other) related to school performance, 
not because earnings are a more valid measure, but primarily 
because it is a more reliable and more readily available measure. 
Earnings are certainly not an ideal measure, since differences in 
income can be attributed not only to differences in the type, quality, 
and extent of education, but also to personality factors, regional 
factors, family contacts, etc. However, income has remained the 
most commonly used measure of the effect of education on student 
output. 



Machlup [31] conceives the educational system as a knowledge - 
spreading industry and evaluates its economic efficiency. He calcu- 
lates that this industry in 1958 produced goods and services worth 
$136.4 billion, and that all forms of education cost $60 billion, or 
almost 13% of the 1958 Gross National Product. He states that the 
total knowledge industry accounted for 29 percent of the Gross 
National Product and is now growing about two and one half times 
faster than the industries that produce all other kinds of goods and 
services. 

Becker [32] studied rate of return from college education, 
allowing for the generally higher initial ability of the college student. 
He found that the rate of return on the investment in college education 
by urban white male students, including income foregone by the student 
while attending school was 12. 5 percent in 1940 and 10 percent in 1950 
before taxes. When the social cost of college education was added to 
the individual cost, the rate of return in both years was about 9 per- 
cent before taxes. Schultz [33] estimated that the rate of return on 
investment in college education in 1958 was 11 percent. He then 
calculated the total years of education in the labor force, gave appro- 
priate weights to each level of education, and estimated that the 
return on the total investment in education was 17 . 3 percent. Schultz, 
like Becker, included income foregone in the total cost of education. 
Both Becker and Schultz calculated on the basis of total resource 

jjc 

costs as well as on private resource costs. 

* Total resource costs include: (a) school costs incurred by society, 
i.e., teachers 1 salaries, supplies, "rental" of buildings and grounds, 
etc. , (b) opportunity costs incurred by individuals, i. e. , income 
foregone during school attendance and (c) incidental school- related 
expenditures paid by individuals, i.e., books, travel, etc. 

Private resource costs include the same three components, except 
that in (a) above tuition and fees paid by individuals are substituted 
for society* s costs, which are normally defrayed through taxation. 



Hansen [34] has derived the internal rate of return for various 
levels of schooling from grade one to the completion of four years of 
college, and indicated that this measure provides a more useful 
method of ranking the economic returns to investment in schooling 
than do the more conventional lifetime or present value of lifetime 
income methods. Miller [35] computed the 1949 capital value of 
lifetime income according to years of schooling. Houthakker [36] 
estimated the present value of income streams associated with dif- 
ferent levels of schooling on the basis of alternate discount values. 

The view adopted here is that the investment which the individ- 
ual and society make in education yields a return in the form of an 
increase (or decrease) in the contributions which the educated individ- 
ual makes to his own well-being and to society throughout his later 
life and that current measures of student performance are indicators 
of the probable extent of these contributions. This view will be made 
more explicit, and methods for obtaining quantifiable input-output 
values will be suggested. 

Imagine a ’’national resource pool” consisting of all the pro- 
ductive output, instantaneous and accumulated (capital), of the 
population, as pictured in Figure 7 . With a growing population, this 
pool can increase merely by the greater numbers of people entering 
the pool than leaving it, assuming the productive capacities of the 
entering and leaving persons are the same. In order for the people 
entering the work force to be able to perform most tasks, they 
require some training, at lease in the language and customs of the 
nation. Above this minimum — let's say, unskilled laborer 

’’Productive output” is here used in a very broad sense to cover any 
human activity which has social or private value. Later, a specific 
kind of productive output and a measure for such output will be 
described. 
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NATIONAL RESOURCE POOL 
FIGURE 7 



training — the question arises as to how much of the national 
resource pool shall be withdrawn from active productive activity to 
increase the future productive output of the entering (or existing) 
work force. The question is similar to that propounded by Adam 
Smith in Wealth of Nations, (1776): How much benefit do I forego now 
in order to increase my benefits later ? For example, in order to 
train prospective engineers, a certain number of "experienced 11 engi- 
neers must be withdrawn from active practice of their profession to 
"teach" the trainees. Simultaneously, a number of unskilled laborers 
must be withdrawn from the work force to become trainees, and also 
accumulated resources must be set aside for bricks and mortar to 
build schools, rather than, say, shoe factories. This can be illus- 
trated as in Figure 8. 

Presumably, after a time, the resource value of the trainees 
will be greater than the loss of withdrawing a, b, and c from the pool. 
A time -dependent relationship is needed to express this. 

Figure 9 shows productive output vs. time for the "trainee" 
and for the same or equivalent person without tra cing. The two 
curves form an interesting map, but the topography can be further 
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THE EDUCATIONAL PROCESS AS PART OF 
THE NATIONAL RESOURCE POOL 

FIGURE 8 



INDIVIDUAL WITH TRAINING 




PRODUCTIVE OUTPUT VS. TIME 
FIGURE 9 



reduced to point values. Two possible point values are 

2 P(m) and £ P<m) 
m 



where 

P: Annual productive output of "trainee” . 

P: Annual productive output of "non-trainee", 
m: Years, from beginning of training or non-training 
bifurcation. 



However, if a decision must be made at the bifurcation point 
whether to shunt an individual to the "trainee" or to the "non-trainee" 
path, the above simple point values may be inadequate since they 
ignore the fact that some of the annual productive output occurs 
closer to the bifurcation point than others. In short, the simple sum- 
mation of annual outputs ignores the time value of productive output. 

It is suggested here that more reasonable point values of the produc- 
tive output curves are given by: 

W = £ [p< m )] ^R(r,m)J 

W = 2 (" P(m)l T R(r, m)l 
m L •* L J 



where 

W: The present worth of the life -cycled productive output 
of the "trainee". 

W: The present worth of the life -cycle productivity of the 
"non-trainee". 

R : The present worth discount factor, 
r: The discount rate. 



t "Life -cycle" productive output is another way of describing the pro- 
ductive output curve. It is an expression for P(l), P(2), . . . P(m), . . . . 



The X to use in the decision rule is: 

X = w - w 

or, the present worth (at the age or date of bifurcation) of the 
difference in life -cycle productive output of the "trainee 11 and 
"non-trainee" . 

The recommendation to use a present worth discount factor on 
the life -cycle productive output is based on the following assumption: 
ASSUMPTION 1 . Productive output which becomes available 
n years from now has greater weight in influencing current 
decisions on the allocation of resources than does the same 
quantity of productive output which becomes available n + m 
years from now (where n = 0, and m > 0. 

Assumption 1 brings with it Condition 1 . 

CONDITION 1. In any specific situation where decisions are 
made using Assumption 1, an appropriate discount rate can 
be specified. 

The choice of an appropriate discount rate requires human 
judgment, and in an educational system there is practically no way to 
prove an error in such judgment. Some comfort can be drawn from 
the hypothesis (which will be tested in the penultimate section) that 
many decisions are relatively unaffected by a change in the discount 
rate (within the range of usually selected values of 3-10%). Further- 
more, there are commonly accepted guidelines for choosing a dis- 
count rate.t Nevertheless, 4 the choice of using present worths of 

^From the point of view of the "national resource pool’ 1 , the minimum 
discount rate should be equivalent to the annual rate of growth of the 
national resource pool attributable to the growth of population. From 
an institutional point of view, the appropriate rate could be the pre- 
vailing rate on loans to the institution or the rate of return on other 
investments made by the institution. 



73 



life -cycle productive output in a decision rule, rather than, say, the 
abstract student performance measure of Minimum Condition A-ii 
is predicated on the belief that the effect of an error in judgment in 
the first case (using present worth) is less than the effect of an error 
in judgment in the second case (using Condition A-ii). 

Returning now to the P(m) given above, it is seen that during 
the training or educational period productive output is consumed, i. e. , 
withdrawn from the national resource pool. It will be convenient to 
treat this ’’negative” productive output as a separate quantity. Also, 
anticipating the form in which data on productive output is currently 
available, ”m” is redefined to mean ”year3 of experience”. There- 
fore: 



X = W - W - V 



= £[p<m)] jR(r, m, T)] - £ jp(m)] [R<r,m)]-£[DMT)][R<r, T )] 



m 
b-a-T, 



■IM 



LA 

1+rJ 



T+m-i-i b-a r* 









where the redefined and new symbols are; 




W: the present worth of the life -cycle productive output of 
the ’’trainee”, excluding educational costs. 

V: The present worth of the educational costs. 

D 1 : The annual educational costs. 
t: Years from bifurcation date. 

T: Nominal time -span for education or training. 

a: Age at which individuals enter the system 
(age at bifurcation point), 
b: Retirement age. 
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In the foregoing, the effect of individual and educational dif- 

A 

ferences on the P, P, and D' has not been considered. If these 
differences are taken into consideration, then the productive output 
for a given individual will correspond to one of the family of curves 
shown in Figure 10. The question of individual and educational dif- 
ferences will now be examined in more detail, first under ideal and 
then under more realistic conditions. Furthermore, an attempt will 
be made to apply the concepts, expounded above for a macro -system, 
to sub-units of the macro -system. 




EFFECT ON PRODUCTIVE OUTPUT FROM 
INDIVIDUAL AND EDUCATIONAL DIFFERENCES 

FIGURE 10 

B. Ideal Conditions 

If one could state the amount of productive output during each 
future year of a student’ s life attributable to specific personality 
factors and to specific performance scores on a specific version of 
a sub-unit of a total learning experience, given the history of per- 
formance score on all other sub-units, then a measure of the ’’gross 
value" of the student* s performance in the sub-unit could be obtained 
from the present worth of the sum of these stated annual productive 
outputs. Furthermore, a "net value" could be obtained by subtracting 
from the "gross value" the present worth of the productive assets 
used in providing to the student the sub-unit of learning experience. 



Explicitly, the conditions for the ideal case are: 

i. A nominally described te aching - le arning program, divided 
into various sequences of sub-units, with various versions of 
each sub-unit, each of which can be separately described and 
analyzed. 

ii. A time span for completing i, and each sub-unit of i. 

iii. A cost associated with providing each sub-unit of i. 

iv. A student performance scoring procedure, in which the 
scores are related to those factors in the teaching-learning 
process which can be manipulated by the educator-experimenter 
and are independent of the student personality factors . 

v. A personality rating prcm4ure, in which the ratings are not 
affected by the teaching-learning program. 

vi. The future increment in life -cycle productive output (of an 
individual with specified personality factors and history of 
performance) attributable to a specified performance in a 
specified version of a given sub-unit. 

In this ideal case, the X used in the decision rule is given by: 



X. . Q * AW - V, 
n 13^ 



where 



AW= 2 r A P(m, g, a. P, j,4 )1 jW, m,T)j 
m=l *" 



and 

R [(r.m.T)] = 2 . 

where 

V = rD(t.j,^)||^R(r,t,T)J 

and 
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R [( r * t *T ) ] s {i^ : } 2 ' 

where the new symbols used above are: 

n: The number associated with each individual in the 
sequential sampling and decision rules, 

£: Sub-unit designation, 
j: A version of the sub-unit. 

i: The sequential number assigned to each ’’individual” 
in ”j J M . 

AP: The increment in life-cycle productive output attri- 
butable to going through the ”j £" sub-unit. 

AW: The present worth of the increment in life -cycle 
productive output. 

t 

g: Student performance score.' 

a: Student personality rating. 

p: History of performance on other sub-units. 

TT* Time span required by student to complete i. 
t: T im e span required by student to complete the 
"j i” sub-unit. 

D: The total costs associated with a sub-unit. 

In this ideal case, the exact information on future productive 
output and on learning time for sub-units which come after the ”j 
sub-unit are presumed to be available at the instant when the student 
completes the ”j 4” sub-unit. Since this is obviously impossible, 
estimates for AP andTmustbe found. Also, AP implies that in the 
ideal case the increment in productivity is directly measurable, 
something which is rarely possible. Most likely, AP will have to be 

^”g” is independent of ”t”. If the performance specifications include 
a measure on speed, then this is reflected in the performance score. 
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derived from the difference of two P' s. Consideration will first be 
given to the question of how to obtain estimates for P' s and TP s, and 
then the possible ways of obtaining AP will be considered. 

T£ the life -cycle productive output of individuals who have 
previously gone through the "i-th" sub-unit and who have the same 
g, a, p characteristics as the student who is currently completing 
the M i-th” sub-unit are available, then it is suggested that an esti- 
mate of P(m) for the student can be obtained from projections of the 
P(m) of the ’’old grads”. The data on past productivity from which 
the estimates of future productivity will be made is designated by 
P(y' ,m, g, a, j,#) where y' indicates the date on which ’’old grads 

entered productive activity. 

It is also possible to obtain an estimate of"Xfor the current 
student by matching the student' s history of ”t” on all sub-units up 
to and including the "i-th" sub-unit with the history of "t" of the 
"old grads” and then projecting from the"T(y' ) of this matched group 
to an e stimatedT for the current student. 

There are various methods for making forecasts, such as is 
suggested above for P and T, from data on previous events to 
projected future events. All such forecasting methods presume a 
certain stability of the environment in which the events occur. Such 
stability does not necessarily mean that no change takes place , but 
rather if changes do occur, then the rate of change should be stable. 

The practical application of much of what follows below 
depends upon the exactness of the forecasting and the ability to 
recognize when the assumptions of stability are being violated. 

Stated another way, the recommendation to make forecasts of future 
productive output from data of previous output is based on the follow- 
ing assumption: 



ASSUMPTION 2 . The factors which affect the relationship 
between an educational experience and subsequent productive 
output remain stable and discernible, 

and 

CONDITION 2 . An appropriate transform can be specified 
for converting data on previous productive outputs to 
estimates of future productive output. 

Since general concepts are being developed in this section, 
the possible transforms that could be used will not be discussed here. 
A detailed example of one such transform, for the life -cycle 
productive output of engineers, is given in Section V . Two comments, 
however, are pertinent at this point. 

First, a word of caution about the indiscriminate use of 
mathematical curve -fitting techniques: a graphic display of the data 
may help in discerning anomolies or violations of Assumption 2. For 
example. Figure 11 shows a graphic plot of some hypothetical 
P( y t , m ) for persons who started productive activity in the years 
(yt ) 1910, 1920, 1930, 1940, 1950, 1960. The dips in the curves at 
the cross -marks show the influence of the anomolous depression 
years. 

Second, by the very nature of the data shown in Figure 11, 
as "m" increases, the number of data points available for forecasting 
P(y,m) decreases. Therefore, the forecast of the productive output 
for the later years of experience are more subject to error . How- 
ever, by using the sum of the present worths of the expected produc- 
tive output for each year of experience in the calculation of n X..^, 
the effect of the larger errors in forecasting the output of later years 
is partly offset by the relatively smaller weighting given to output of 
these later years. 
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SCATTER DIAGRAM OF HYPOTHETICAL PRODUCTIVE OUTPUTS 

FIGURE 11 
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Some additional factors should now be considered, the first of 
which is a mortality factor. In making projections of expected life- 
cycle productive output from the record of individuals who have 
already had a productive output for "m" years, a mortality factor 
should be included to account for the probability that a student will be 
alive during each of the "m" years of his potential life -cycle of 
productive output. Also, a transform should be included to convert 
the various measures of productive input and output to one common 
measure, preferably to a monetary measure. It will be recalled 
that "productive output" is being used in this ideal case to cover any 
human activity which has social or private value and could include 
such diverse things as building bridges, writing scientific papers, 
receiving honors or prestige, enjoying leisure time, painting non- 
salable paintings, and so on. However, even in the ideal case, 
dimensional conformity is required of all the elements in an equation. 
A monetary measure, and more specifically, a dollar measure is 
recommended because much of the inputs and outputs that are likely 
to be considered are already measured in dollars. The use of a 
transform to convert all forms of productive output to dollar units 
brings with it the need for a transform that will convert dollar units 
of productive output reported in one year into dollar units of produc- 
tive output reported in another year. In other words, adjustment 
must be made for the year to year fluctuation in the value of the 
dollar. 

Consider now the problem of finding a AP, the increment in 

% 

life “cycle product! /e output attributable to going through a particular 
sub-unit. In some rare cases it may be possible to match the 
productive output (P) of individuals who have had the ".0-th" sub-unit 
with the productive output (P) of individuals who have had all other 



sub-units except the "X-th". Incorporating all of the above ideas gives 
the following modified ideal case . 

C. Modifications to the Ideal Conditions 

1. Conditions for the Modified Ideal Case Using 
p(y, P) and p(y, P): 

i. A nominally described teaching-learning program, divided 
into various versions of each sub-unit, each of which can be 
separately described and analyzed. 

ii. A time-span for completing i, and each sub-unit of i. 

iii. A cost associated with providing each sub-unit of i. 

iv. An objective, stationary, student performance scoring proce- 
dure (i.e., where scores obtained now have the same signi- 
ficance as scores obtained some years ago), where the scores 
are related to those factors in the teaching-learning process 
which can be manipulated by the educator-experimenter and 
are independent of the student personality factors. 

v. A personality rating procedure, in which the ratings are not 
affected by the teaching-learning program. 

vi. Data on the life -cycle productive output of individuals who 
have previously completed all sub-units of i. The data are 
sub -classified according to personality factors, history of 
performance on all sub-units of i, the date of entering pro- 
ductive activity, and for each year of experience. 

vii. Data on the life -cycle productive output of individuals who 
have previously completed all except the n X-th M sub-unit of i. 
The data are sub -classified as in vi. 

viii. A transform which converts the data given by vi and vii into 
expected (or future) life -cycle productive output data. 



ix. A transform which converts all measures of productive output 
to a common monetary measure. 

x. A transform which converts monetary values reported for one 
year into the equivalent monetary value of any other year. 

xi. Data on the probability of survival for individuals who do and 
for those who do not go through i. The probability of survival 
at age M a M at the bifurcation date is equal to one. 



The new expression for X under these conditions ist 



X .. 0 
n ij* 



= W W - V 



- £[p(y, d(f(P(y> . m, g. a.fi.i, £»))] [R(r. m, p(y.T(y ' t)j)J • 

• |l&(a, m, p(y,T(y' , a, p t j, t))^J 

- d(f(p<y‘ * P>)))][ R ( r ' p(y* T( y ' ' a ’ P’ t} ))] ' 



jM^a, m, p(y,T (yj a, P, t) )) J 




where 



T s T(y , ,of,P,jJit) 

^ s T(y* ,<*,/3,t) 

and 

A 

y = y + p(y#T)- t 
y = y + p(y*T)- T 

in which y and y can best be found by successive approximations . 

The new and redefined symbols in the above expressions are: 

W: Present worth of life -cycle productive output, exclusive 
of educational costs. 

P: Annual productive output. 

* : A sign to indicate that the symbol below the sign is 
associated with the individual who has had all sub-units 
in i. 

A sign to indicate that the symbol below the sign is 
associated with the individual who has had all but the 
M l-th" sub-unit of i. 

y: Current date. 

y*: Date on which individual who previously completed i 
started productive output. 

m: Years of experience, since starting productive output. 

p: A transform which operates on the history of past events 
to give an estimate of future events. 

f : A transform which converts all forms of productive output 
to dollar values of the year that the output occurred in. 

d: A transform which converts dollars of any given year 
into dollar values of any other specified year. 

M: A mortality factor (or more correctly, a probability of 
survival factor). 



In the event that the p(y,T) are less than one -half year, the 
above formulation is considerably simplified, since we can ignore 
T, t, and t in the discount factor B. Thus: 

„ i 

b-a m "2 

n X yi = z [p (y> d(f(P(y' , m, g. a, |3, j, i»)) j [{y^} J • 

• T M(a +m -i)l 

- £ [p (y- d(f(P(y',m, a, 0))j)l [{n^} ] [ ®*(a +m - |)1 

m=l L 



x -D(t,j,i) 

Either of the above formulations may be adequate for the case where 
"i-th M sub-unit under consideration is the last one in the sequence of 
sub-units of i, and also in the case where the student* s performance 
in one sub-unit is independent of his performance in another sub-unit, 
an assumption which is often made for the sake of mathematical 
simplicity,^ but one which seldom makes sense in most teaching- 
learning programs. The temptation to use simplifying assumptions 
is understandable, for in this case the logical move is to use the 
performance results on past and current sub-units to fill in the con- 
ditional probabilities of performance on future sub-units, a procedure 
which becomes exceedingly unwieldly and increasingly imprecise as 
the number of sub-units increases. 



Since initially it may be difficult to accumulate enough 
P(y* , m, a, P) to use in obtaining satisfactory p(y, P(y* , m, a, 0)), 
two other possibilities should be examined. 

^ Smallwood and Pask both make this assumption in their adaptive 
system models.. 



One of the possibilities is that the proportional part that each 
sub-unit contributes to the overall subsequent productive output can 
be stated outright, in which case other conditions prevail: 

2. Conditions for the Modified Ideal Case Using 
p(y, P), p(y, $) and Proportionality Factors. 

i. Same as C-l-i. 

ii. Same as C-l-ii. 

iii. Same as C-l-iii. 

iv. Same as C-l-iv. 

v. Same as C-l-v. 

vi. Data on the life -cycle productive output of individuals who 
have previously completed i. The data are sub-classified 
according to personality factors, history of performance on 
all sub-units of i, the date of entering productive activity, 
and for each of the years of experience. 

vii. Data on the life -cycle productive output of individuals who did 
not go through i, but who had the same initial qualifications 
as those who went through i. The data are sub-classified 
according to personality factors, the date of entering produc- 
tive activity, and for each of the years of experience. 

viii. Same as C-l-viii. 

ix. Same as C-l-ix. 

x. Same as C-l-x. 

xi. Same as C-l-xi. 

xii. Proportionality factors which indicate the part that each sub- 
unit contributes to subsequent overall productive output. 



For this case: 
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where the new symbols are: 

c: A proportionality or weighting factor, where 

£cU) « 1.0 . 

J 

y: Date on which individual who does not go through i starts 
productive output; also the bifurcation date. 

Simplifications can be made in the above formulation if p(y/T ) is less 
than one -half year. 

It should be emphasized that M c M is a subjective measure. If 
objective measures are available, they would be used directly without 
introducing M c", as for example in the comparison of P and P given 
above. The difficulty with this subjective measure is that there is 
less concrete evidence and there are fewer guidelines available to 
help determine the magnitude of "c” than for any other element that 
enters into the determination of „X „ ^ . A common assumption, 
particularly where the sub-units are very small blocks of learning, 
is that each sub-unit has equal importance, and therefore all c- values 
are equal. Another common practice (for example, with the semester 
courses of a college or high school program) is to divide the sub-units 



into two major cate goriest and within each category, weight the sub- 
unit in direct proportion to the number of teaching hours allocated 
to that sub-unit. This practice assumes that within each category 
importance is related to teaching time and presumes that the amount 
of teaching time required for each sub-unit can be rationally 
resolved. D. Rosenthal, A. Rosenstein, and G. Wiseman [37] have 
suggested a novel way for a faculty committee to resolve the question 
of how to specify the relative (though still subjective) weighting of 
the sub-units. Nevertheless, the determination of "c" remains one 
of the more interesting areas for further research. 

In a comprehensive application of an adaptive teaching system, 
one may have to settle for subjective approximative values for "c" 
when the system is inaugurated but include a feature for the accumu- 
lation of P data which, in time, can be used to supplant the use of 
"c". In many cases, the adaptive decisions will not be affected even 
by the choice of an inappropriate M c M , particularly in those cases 
where: 

c(W- W)1 c(W - W) I 

»Vor ><<V 

c* (W- W)J c* (W - W)J 

where V* is the value actually used and "c* " is the unknown 
"correct" value. This contention will be examined further in 
Section VI. 

Returning now to the problem of how to circumvent the dearth 
of data on P(y* ,m, a, j3), another possibility to consider is to forego 
the analysis on the sub-units of i and restrict oneself to making 
analyses for the entire i in which case neither P nor "c" is required. 

'For example, one category could include all the laboratory and 
"non -academic" courses, while the other category could include all 
the lecture -recitation courses. 



m this situation "j" could indicate a specific sequence of variations 
of the sub-units. If there are many such sub-units and variations of 
sub-units, then the number of "j" will be very large, and we are 
back to the old problem of fragmenting the P(y* ) into so many sub- 
divisions that very large numbers of P(y' ) will be needed to make 
reasonable forecasts of the future P(y). On the other hand, if there 
are few or no sub-units in i worthy of separate analysis (such as in 
short courses and in many industrial training situations), then this 
alternative is entirely reasonable. The conditions for this case are 

given below. 

3. Conditions for the Modified Ideal Case Using 

p(y, P) and p(y, $) for the Entire Learning Program, 

i. A nominally described teaching-learning program, 

ii. A time -span for completing i. 

iii. A cost associated with completing i. 



iv. 

v. 

vi. 

vii. 
• • • 

Vlll. 

ix. 

x. 



Same as C-l-iv. 
Same as C-l-v. 
Same as C-2-vi. 
Same as C-2-vii. 
Same as C-l-viii. 
Same as C-l-ix. 
Same as C-l-x. 
Same as C-l-xi. 



The formulation of X for this case is straightforward. 



XI . 
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X.. = W - W - V 
n 13 



s Z [p (y* d ( f (p<y 1 1 > S> <*>))) ] [ R < r * m/T )J [M(a, m,T ) 

- Z |p (y* d(f(p(y‘ , m, or)))) J ^E(r, m)J j^M(a, m)J 
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b-a-T, 



."T+rn-i- 



= E ^ [p(y»<K*(F<y ^>6. “>)))][ {^} J j^M(a+m-i+T)J 

- E i [ P (y» d(f(P(y' , m, or)))) J J j^M(a+m-i)J 



If T, t and t are less than one -half year, the above formulation 
can be simplified to : 

a x ij£ = jf [p (y. d(f(P(y' . m. g. a]))) ] 2 ][M(a+m-i)] 

’ijf [ P (y. d(f(P(y' . m. a)))}] [{y ^}" 1 * ] [ M<a+m-!)] 

- D' (T) 

The ideal case has been treated at some length because it 
represents an attainable set of conditions. Admittedly, currently 
available conditions are far removed from the ideal conditions, and 
it will be necessary to introduce additional assumptions to obtain a 
model that can be used today. The practical procedure would be to 
start using the strongest model that will work with the currently 
available data and simultaneously start gathering data in a form 



suitable for use in a model more closely approximating the ideal 
model. 

The discrepancies between the ideal conditions and the condi- 
tions currently prevailing are given below: 

a. Student performance scoring procedures generally are 
not objective, stationary, independent of student person- 
ality factors, nor related only to the factors in the 
teaching-learning process which can be manipulated by 
the educator -experimenter. 

b. Personality rating procedures which are independent of 
the teaching-learning process and which are related to 
life -cycle productive output are not available. 

c. Data on life -cycle productive output is not generally sub- 
classified according to the (unavailable) personality 
factors, nor according to the (available) history of 
performance on all sub-units of i, nor are all the elements 
of an individual 1 s output recorded. 

D. Current Conditions 

The question now arises: can a reasonable estimate of X be 
obtained from existing data? The answer depends in part on where 
the data are coming from. Some institutions have available fairly 
detailed information on individual graduates (see Section V on engi- 
neering graduates of the University of California); in other cases 
individual records are not available and only group mean or median 
figures are quoted. For example, original data on individuals in old 
Bureau of Census and Labor Department surveys have been lost or 
destroyed,, and only group median figures are available . The answer 
to the question of whether reasonable estimates of X can be obtained 
from existing data depends also in part on the further assumptions 



one is willing to make in order to reconcile existing data with the 
modified ideal set of conditions given in C above » 

For example, most of the old data on productive output are 
stated only in terms of dollar earnings, with no account being given 
to other possible signs of non-dollar productive output such as 
scientific publications, service to the community, etc. There are 
many ways of arguing this issue, from the one extreme which says 
that most apparently non-dollar productive output is eventually 
reflected in higher earnings, to the other extreme which says that 
our society accurately reflects the value it places on productive output 
by the dollar compensation it makes for such output. Both extreme 
views are certainly untenable for many individual cases but may be 
fairly accurate when median figures for large groups of individuals 
are considered. The assumptions that are suggested for the use of 
old data are: 

ASSUMPTION 3 . Annual earnings are an adequate measure of 
productive output. 

ASSUMPTION 4 . Where data on the annual earnings of 
individuals in a specified group are not available, the median 
annual earnings of the group can be used. 

Using Assumptions 3 and 4, 

f (p(y* , m, . • . — $ (y * » • • • ) 

where the $ sign represents median annual earnings, in dollars. 

Since dollars have different values in different years, in order 
to get some consistent value system, the following assumption is 



made: 



ASSUMPTION 5. A stable reference for dollar values is the 
purchasing poTer (on a specified list of commodities and 
services) of the dollar. 



Using this assumption, the following simple d-transform is sug- 



gested: 



d( 



v CPI(y) 

9 "CPI(y' +m) 



( ) 



where CPI(z) is the Consumer Price Index for the z-th year. A word 
of caution about the use of CPI: from time to time the specified list 
of commodities and services used for evaluating the purchasing 
power of the dollar changes; also, the list is designed to reflect the 
normal purchases of the urban moderate income family, and the group 
whose $(y' , m, . . . ) is being observed may not fall into this category. 



Looking now at the ideal requirement that performance scores 
should be independent of personality factors, it becomes apparent 
that not only are the personality factors not specified in old data, but 
that these factors are inextricably mixed into the performance scores 
This gives rise to the following further assumption for the use of old 



data: 

ASSUMPTION 6. Personality factors need not be excluded 
from performance scoring procedures. 



This gives rise to a new symbol, g* , which represents per- 
formance scores that reflect both differences in the teaching-learning 
program and individual personality differences and eliminates a from 

the formulation of X. 



It must furthermore be recognized that g' is usually not 
obtained from an objective scoring procedure, but rather from a 
relative ranking procedure and that the scoring procedure is not 
stationary. It is therefore necessary to make: 
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ASSUMPTION 7 . An adequate relationship can be found 

between previously recorded g' ard currently observed g. 

Stated another way# a transform M h M is needed, which serves 
to map elements in a set of g to elements in a set of g' , or 

g h(g' ) 

This h-transform will, of necessity, be different for each 
specific application, and an example of one such transform is given 

in Section V . 

There is a further complication. Very often the median 
earnings data are not sub -classified according to g' , but instead an 
overall median including all values of g' is quoted. However, it may 
still be possible to use these global median figures if, from inde- 
pendent sources, a relationship can be established between perform- 
ance in school and subsequent life -cycle earnings. Then when finding 
P (y* P(y* 1 ®!?' •••))» instead of using the P(y' , m, g 1 . . . ) which 
corresponds to the "g" of a current student, one would use the undif- 
ferentiated P(y' ,m, . . . ) and a transform to obtain p(y. P(y' ,m. . . )j . 
to order to do this another assumption must be made: 

ASSUMPTION 8. There is a discernible and independently 
verifiable relationship between performance in school and 
subsequent life -cycle productive output. 

There have been many studies on the relationship between 
performance in school and subsequent productive output, the vast 
majority of which report no significant relationship. As a result, 
there exists a fairly prevalent feeling that such relationships do not 
exist or at best, can only be teased out by introducing such co- 
variables as family background, geographic area, personality factors. 
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etc. However, a careful analysis of the studies on which the pessi- 
mistic feelings are based reveals that most of the studies were made 
on students who took their major in colleges of letters and science. 
This led to the speculation that training for professional practice 
(as in the case of engineering education) would be more highly corre- 
lated to later professional success than general education would be 
to later success in the variety of occupations in which a person could 
be engaged after such general education. 

A re -sifting of the literature on such correlational studies was 
only partly encouraging. For example, Pierson [38] reported that 
for 320 engineering graduates examined, he found a correlation of 
0. 43 between their GPA on all college work and a rating of success^/” 
in their professional life (rated by a faculty member who best knew 
the person in college); On the other hand, Havem&n and West [39] 
indicate for the general college graduate, the low graders earn less 
than the high graders, but the highest graders are often in low-paid 
jobs such as teaching, etc. Some encouragement comes from 
Wallace [40] who, in 1954, studied alumni of the University of 
California Schools of Engineering and observed a slight tendency for 
higher salaries to go with higher grades. 

Apropos to measures of productive output other than earnings, 
Taylor [41] investigated whether engineering undergraduate grades 
were predictive of later research activity. He used 239 cases and 
measured research performance by a three -category rating. The 
tri-serial correlation between these ratings and GPA was a disap- 
pointing . 06 . But two apparently contradictory reports finally helped 
unravel the puzzle. LeBold [42] made a study of current monthly 
salaries of 3977 alumni of the Purdue University Engineering School 
and reported a positive relationship between income and scholarship 



o 
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for the group with 10 to 25 years of experience. On the other hand, 
Eurich [43] quoted two studies, one by the Hughes Aircraft Company 
and the other by the National Advisory Committee for Aeronautics 
(now absorbed by NASA), wherein for practicing engineers with 
6 to 9 years of eaqperience, no correlation was found between their 
achievement or salaries and their college records. The significant 
point was the different number of years of experience quoted in 
LeBold' s and Eurich' s reported studies. Could it be that the differ- 
entiation in earnings was related not only to school performance but 
also becomes more pronounced with increasing years of experience? 
Actually the answer had been given years before (1928) by Gifford 
and later quoted by Bridgman [44]. Gifford found that for the 3806 
Bell Telephone System college graduate employees that were studied, 
higher salaries were associated with higher college standing, and 
lower salaries were associated with lower standing. Furthermore, 
the differences in salaries of the high college standing and the low 
college standing groups became increasingly apparent the longer they 
were employed. These findings are vividly demonstrated in Figure 12. 
However, a long time has passed since Gifford'. s study was made and 
that study had been based on data from the 1890' s to the 1920' s. 

More recently (1962) another study of 10, 000 Bell Telephone SJystem 
college graduates had been made by the American Telephone and 
Telegraph Company. The report on this study [45] indicated that the 
employees were divided into four groups: top tenth, top third, 
mid third, and lower third of their graduating class. When they were 
cross tabulated by salary thirds, a decided relationship between rank 
in graduating class and progress in the Bell System was evident. 

That is, 51 percent of those in the top graduating third were in the 
top salary third; 40 percent of those in the lowest graduating third 
we re in the lowest salary third. After this encouraging report was 
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received, the American Telephone and Telegraph Company investiga- 
tors were prevailed upon to prepare a graph similar to Gifford' s for 

use in this study. The new graph is shown in Figure 13. Note that 

the difference in median salaries of the top members of the class 
and the lower members is not so great in the recent study as it was 

in the earlier study. 

In the meantime, an analysis was made of the data that was 
very conveniently made available at this time from a comprehensive 
study of the engineering graduates of the University of California 
(Los Angeles and Berkeley) conducted by Harry Case. William LeBold, 
William Diemer and their associates. At the time this analysis was 

(1963), data on 1466 graduates from the years 1947 through 1962 
were available. AU individuals reported their earnings for each year 
since graduation and also their average grades while in college. A 
check on a sample of 170 graduates revealed that student-reported 
grades and grades actually recorded by the registrar correlated at 
0. 86, and hence the reported grades were thought to be sufficiently 
accurate for purposes of correlating school performance and the 
earnings received in later careers. The sample consisted of graduates 
of different years having different lengths of experience on the job. 
Because the purchasing power of money has itself changed during this 
period, all reported earnings were made comparable by converting 
them to equivalent dollars of 1962. Then the median earnings for each 
category of reported college grades were calculated as a percentage 
of the overaU median for each year since graduation. This is shown 
in Figure 14. We note a similarity between the results of the 



data is available on family background, high school exp e r- 
ience personal factors, etc., on these graduates, and members of 
Dr. Case 1 s group are making their own analyses on how these other 
factors may co-vary with earnings and grades. 
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University of California study and the later American Telephone and 
Telegraph Company study. 

The two studies on the Bell System employees show a rela- 
tionship between relative position in the class and subsequent earnings, 
and the University of California study indicates a relationship between 
grades and subsequent earnings. In these three studies, the 
measure of each student' s school performance is relative to the 
performance of the student body as a whole. We should bear in mind 
that the factors which influence a student to perform at a level to 
place him in the top third of his class, or to get an A grade, may be 
the same factors which subsequently influence his earning power. 
Inborn intelligence, drive, competitiveness, ambition can be sug- 
gested as possible factors, and it is exactly these factors which are 
not directly manipulated in most educational experiments. There- 
fore, it is only with caution and with full cognizance of the implica- 
tions of accepting Assumption 6 that one can recommend using a 
transform for modifying median expected life -cycle earnings to 
reflect different expected life -cycle earnings for students with dif- 
ferent college performance scores. Such a transform, "w", depends 
on performance score and years of experience and operates on the 
undifferentiated or overall median expected life -cycle earnings: 



in the case where performance is not independent of personality 
factors, and where some change may also have occurred between the 
grading technique employed in determing "w" and that employed on 
the current students. 




or more likely: 




If, now, the further assumption -- 

ASSUMPTION 9 . An individual' s learning time and perform- 
ance score for the sub-unit under examination is representa- 
tive of the learning times and performance scores for that 
individual in all other sub-units -- 

is made, then £ can be eliminated from the formulation, and the 
estimated time for completing i can be found by the following sub- 
stitution: 

p(y,T ) « ££ 



where, it may be recalled, 

t: time span actually required by a student to complete 
the sub-unit. 

T' : nominal time span for completion of the sub-unit. 

T: nominal time span for completing i. 

Lastly, and perhaps the most questionable, is: 

ASSU MPTION 10 . Each sub-unit contributes to future 
productive output in the same proportion that the nominal 
time span for completing each sub-unit bears to the nominal 
time span for completing the whole teaching-learning program. 



This assumption gives: 



cU) = 



T 1 W 

i 



T'(.e) 

T 



For example, if it were ascertained that students spent approx- 
imately 7200 hours in and out of class in study and related activities 
during the normal four-year college period, and if the (W - W) for a 
given student is $82,600, then the "output" for an average one-hour 



learn i ng experience (including in and out of class time) would be 
x $82,400 = $11.50. 

To recapitulate, the conditions for using currently available 
old data are given below: 

i. A nominally described teaching -learning program divided into 
various versions of each sub-unit, each of which can be 
separately described and analyzed. 

ii. A nominal time span for completing i 2~d each sub-unit of i. 

iii. A cost associated with providing each sub-unit of i. 

iv. A student performance scoring procedure. 

v. A transform for relating current scoring procedures to 
previous scoring procedures. 

vi. Data on the median life -cycle earnings for the group of 
individuals who have previously completed i. The data are 
sub -classified according to date of entering productive 
activity, and for each year of experience. 

vii. Data on the median life -cycle earnings for the group of 
individuals with the same initial characteristics as the group 
in vi, but who have not gone through i. The data are sub- 
classified according to date of entering productive activity, 
and for each year of experience. 

viii. A transform which converts the data given by vi and vii 
into expected (or future) life-cycle earning data. 

ix. A transform which converts median expected life -cycle 
earnings into expected life -cycle earnings for individuals 
with different school performance records. 

x. A transform which converts dollar earnings reported for one 
year into equivalent dollar values of any other year. 



xi. Data on the probability of survival for individuals who do and 
for those who do not go through i. 

xii. Proportionality factors which indicate the part that each sub- 
unit contributes to subsequent overall productive output. 

xiii. An estimation of the total time required for the individual 
student to complete i. 



Using the transforms suggested above for this case, the 
n X^ for a student with (g,t) is: 



w tT 
b ~ a ~T ! 



nV*.*) 



t* Z [ w ( h <g* >» p( y» cpi T y'+ m ) 3 ( y' » m> ))] 

HI — JL 



tT 

T* 



+m- 4 



l Xi* 

■ rM(a+m-|)J 



T' (h) 



CPI(y) 






.«n 



CPKy* +m) 





r ti 




m 1 



where 

A tT 

y = y + ft - t 

It is appropriate, at this juncture, to examine how the data 
for the right-hand side of the above expression can be obtained. 



SECTION V 



DATA ON THE OUTPUT OF 
ENGINEERING EDUCATIONAL SYSTEMS 

It has been pointed out in Section IV that the formula for 
obtaining n X_^(g, t) from existing old data would probably be most 
appropriate for educational or training situations which impart knowl- 
edge and skills that are direct use in later professional practice. 
Engineering education qualifies as such a teaching -learning situation. 
Furthermore, it turns out that the only group for whom relatively 
precise records of earnings have been kept over the past fifty-five 
years is the professional engineers. It is therefore within engineer- 
ing education that the unique opportunity exists to immediately employ 
the valuation techniques described above. 



A. National Data on Engineers 

Engineers' salaries have been surveyed on a national basis 
since 1908. A composite picture of some of the survey results is 
shown in Figure 15. Table C-l of Appendix C gives detailed informa- 
tion on the sources of earnings data and mentions the adjustments 
that have to be made in order to reconcile data from different sources. 
Also indicated in Figure 15 are the 1962 median salaries of engi- 
neering graduates from the University of California (Berkeley and 

t 

Los Angeles Campuses). 



The salaries shown in Figure 15 are not directly comparable, 
since the purchasing power of the dollar changed during the reported 
period. Consumer Price Index figures and Adjusting Factors for 
different years are shown in Figure 16. In using the Consumer Price 



^From unpublished data. University of California Engineering Grad- 
uate Study, courtesy of H. W. Case, William LeBold and William 
Diemer. 
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Index, and the Adjusting Factor derived from it, one shQuld be aware 
that the adjustment is approximate, since engineers’ earnings tend 
to be higher than that of the urban mode rate -income family whose 
living costs the C. P* I. is designed to measure. 

Figure 17 shows the reported salaries adjusted to 3 962 
equivalent dollars. Comparison of Figures 15 and 17 reveals that 
real purchasing power has increased less dramatically than dollar 

earnings. 

Observe that Figures 15 and 17 show the median salaries 
versus years of experience for the different survey years . Not 
directly shown is the income of, say, the engineers who graduated 
in 1953. Their salaries are shown at zero years of experience on 
the 1953 curve and at five years of experience on the 1958 curve. By 
picking the data from the existing survey curves, life-cycle data for 
engineers who graduated in different years can be obtained. The 
unadjusted life -cycle earnings are shown in Figure 18. The adjusted 
life -cycle earnings are shown in Figure 19. 

Earnings are seldom shown in this form, but this is the form 
needed for comparing life earnings of engineers who graduate at 
different times and is also necessary for projecting expected life 
earnings of graduates, of, say, the 1962 class. Shown in Figure 19 
are the pr ojected life-cycle earnings of the 1962 graduate. A middle 
high, and a low estimate are indicated. 

Based on the projections shown in Figure 19, the total 
expected life earnings for the ’’average” engineer graduating in 1962 
is approximately $579, 000. The present worth of the expected life 
earnings, adjusted for mortality^ and discounted at different rates 
(3%* 4 i%, and 6%), is shown in Figures 20 and 21. Figure 20 shows 

^See next page . 
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LIFE-CYCLE EARNINGS FOR ENGINEERS 

FIGURE 18 
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FIGURE 21 



the present worth at age 22, the supposed age of graduation from 
engineering school. Figure 21 shows the present worth at age 18, 
the supposed age at which a high school graduate would choose between 
going to engineering school or going to work. 

B. National Data on Comparison Group 

On the assumption that an income somewhat more than the 
national median income would be earned by the high school graduate 
who had the ability to enter engineering school but instead chose to 
work, the median salary of craftsmen, foremen, and kindred workers 
was selected for comparison purposes. For this group, national 
salary surveys in relation to age are available for two years -- 1946 
and 1951. For other years, only the overall median salary is re- 
ported. However, median income figures for all males by age are 
available, and these are used as shape curves to derive the salary 
curves for craftsmen, foremen, etc . The available data are given in 
Tables C-4 and C-5 of Appendix C. 

Figure 22 shows two curves for each of the survey years 1946 
and 1951. Notice that the salary curves for craftsmen, foremen, etc. , 
closely parallel the income curve of all males except at the extremes 
where the latter curve drops off rapidly. Another observation is 
that the median earnings for craftsmen, etc . , occurs at an age three 
years later than the median earnings for all males. Bearing these 
facts in mind, one can derive the salary curve for craftsmen, etc. , 
for say, the year 1961 as follows: The earning curve for that year 

t gee Table C-3, Appendix C for sources of information and calcula- 
tions of survival factor. Note that no adjustments were made for 
school attrition and rate of unemployment . The effect of unemploy- 
ment is reflected in the basic data on median salaries, and it is 
assumed that the undetermined rate of unemployment remains con- 
stant. 
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for all males is plotted; the overall median is located at Point A; 
the known overall median salary for craftsmen, etc. , is located 3 
years later at Point B; then a curve is drawn such that it passes this 
Point B and is parallel to the all-male* s earning curve in the middle 
if the distribution and drops off only gradually at the extremes. This 
is the derived curve for craftsmen, etc., for the year 1961. Similar 
curves are drawn for years 1949, 1955, and 1959, and all these are 
shown in Figure 23 . The curves shown in Figure 23 are not directly 
comparable since the dollar value is not the same over the years. 
Using the adjusting factors based on the Consumer Price Index, 
adjusted earnings were obtained and plotted in Figure 24. Figures 
23 and 24 show median salaries versus years of experience for the 
survey years shown. As in the case of engineers, data from these 
curves were used to obtain life -cycle curves for craftsmen, etc. , 
who finished high school in different years. Figure 25 shows the 
unadjusted life -cycle earnings, and Figure 26 shows the life -cycle 
earnings adjusted to 1962 equivalent dollars. Earnings of skilled 
workers are rarely shown in this form. Some previous efforts to 
derive craftsmen's life -cycle curves, such as done by DeHaven [46] 
and by Stewart [47] have been based on the assumption that beyond 
the apprenticeship period craftsmen income remains fairly constant. 
The life-cycle earnings curve of construction workers deemed by 
both Stewart and DeHaven to be representative of high school grad- 
uates who chose to work rather than go to engineering school is also 
shown in Figure 25. Figure 26 includes projections for the various 
years and three estimates (high, middle, and low) for the 1958 high 
school graduates. 

The mid-estimate of the total expected life -earnings for an 
average skilled v/orker graduating from high school in 1958 (presum- 
ably the bifurcation date for the engineer who graduated from college 
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in 1962) is approximately $317, 000. The present worth at age 18 of 
this expected life earnings, adjusted for mortality and discounted 
at different rates, is shown in Figure 27 . 

Room, board, transportation, and incidental expenses 
involved in the cost of an engineering edui ation are excluded from 
the computation in this analysis. The assumption is made that the 
value of these items will be approximately the same fsr engineering 
students and the comparison group of working craftsmen. 

Also, earnings foregone by students are not included, since 
the effect of the foregone earnings shows up in the calculation of the 
difference in the present worth of the expected life earnings for engi- 
neers and craftsmen. However, where students work part-time 
while going to school, part-time earnings should be included in the 

calculations . 

The pr im ary concern here is with the total cost of education, 
including those costs borne directly by the student and those costs 
defrayed from public or private sources. Such costs are labelled 
"cost to society" to differentiate them from the personal cost to the 
student or his family. Typical costs and earnings are illustrated 
(to scale) in Figure 28. Figure 29 compares the present worth of 
expected life earnings and educatif -nal costs for engineers and crafts- 
men, at different discount rates. This figure shows a difference in 
the total expected life-cycle earnings cf the (1962 graduate) engineer 
and the craftsmen amounting to approximately $236, 000. At a 
discount rate of 4 |% this difference shrinks to $73, 000. The inter- 
section of the two curves indicates that the internal rate of return on 
an engineering education would be approximately 17%. 

At this point enough data have already been presented to 
perform some interesting macro -system studies. For example, if 
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one wished to keep a specified difference between the expected pres- 
ent worth of an engineer’ s and a craftsman’s life -cycle earnings 
and also maintain the same level of performance while the engineer 
is in college, but decrease the learning period from four years to 
three years, how much more could one afford to pay in educational 

costs? Or:; 

X(g, 4) = X(g, 3) 

W(4) - W-V<4) =W(3) - W - V(3) 

V(3) - V(4) = W(3) - W(4) 



In a numerical solution, using y = 1962, a = 18, b = 62, and 
r = 4j%, the right hand of the last equation gives approximately 
$12, 600, which is the expected worth of the average engineer’ s life- 
cycle income attributable to finishing school and starting to work one 
year earlier than is currently customary. Also, using the national 
average annual "cost to society", for an engineering education. 





• T M(a+m- i + 4) 




ERIC 



and 



$6,400 + $12,600 



sHlfe) ']* 

If D m (t) is a uniform annual figure, and r 
3 



= 4|% 






000 



D " = i xS =$6 * 800 - 



Therefore, one could theoretically afford to spend up to 
$6, 800 for each of the three years in an accelerated program, as 
compared to approximately $1, 800 for each of the years in the normal 
four -year program. 

Another variation of this problem is to calculate the additional 
amount of resources one would be willing to commit to education if 
these additional expenditures resulted in a student getting an M. S. 
instead of a B. S. degree in four years. 

Somewhat more speculative, since it introduces the additional 
uncertainties of the relationship between school performance and 
subsequent professional performance, is the problem of calculating 
the additional amount of resources one would be willing to commit to 
education if these expenditures resulted in a student getting, say, 
an A average instead of a B average. 

The above examples are sufficient to indicate the range of 
problems that could be investigated. Full treatment of such problems 
is left to a later work, since the primary concern here is how to use 
the input-output data in an adaptive decision situation. 

C. University of California Data on Engineers 

In an adaptive decision situation, one should, of course, use 
the data which are most relevant to the specific situation. For 



example, planners in an engineering school could, as a starter, use 
the national median earning figures for forecasting the expected life- 
cycle earnings of their graduates if no specific data on the earnings 
of graduates from that school are available. Where additional 
information is available it should be used. An illustration of the use 
of additional data is given below for the case of the graduates from the 
Berkeley and Los Angeles Colleges of Engineering of the University 
of C alif ornia. A difference between the reported national median 
earnings of engineers and the median earnings of University of 
California engineering graduates for the survey year 1962 was already 
noted in Figure 15. A plot of the unadjusted median annual earnings 
by year of graduation shown in Figure 30 reveals that the University 
of California median figures are consistently higher than the national 
median. The University of California figures were adjusted for 
change in dollar values and re -plotted in Figure 31. Since the 
Los Angeles campus of the University of California had its first 
engineering graduates in 1949, the earning curves do not extend 
beyond thirteen years of experience. Therefore, the general shape 
of the national expected life-cycle earning curve (Figure 19) is used 
along with the available curves on University of California engineer- 
ing graduates to project an expected life -cycle earning curve for 
the class of 1962. 

An idealized set of performance correction factors (shown by 
solid lines in Figure 32) was derived from a combination of the 
American Telephone and Telegraph Company data (Figure 13) and the 
available data, covering a shorter span of years, from the University 
of California. Also hown in Figure 32 (by dotted lines) are the two 
extreme estimates for the performance correction factors, i.e. , 
first, where it is assumed that no correlation between school per- 
formance and subsequent earnings exists, and therefore all 
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PERFORMANCE CORRECTION FACTORS 

FIGURE 32 
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performance correction factors are equal to 1.0, and second, where 
Gifford's old results are used to estimate the correction factors. 



The present worth of the expected life -cycle earnings is 
affected by differences in school performance scores, by the time it 
takes to complete the education, and by the discount rate. These 
three factors are used to modify the expected median life -cycle earn- 
ings for the University of California engineering graduate of the Class 
of 1962, and are displayed in Figures 33, 34, 35, 36, In each figure, 
the left-hand diagram is based on the assumption of no correlation 
between school performance scores and subsequent earnings, the 
right-hand diagram is based on Gifford' s extreme performance cor- 
rection factors, and the middle diagram is based on the idealized 
University of California performance correction factors. The shaded 
areas indicate the range Df values between the high and low estimate 
of the median, expected life -cycle earnings (see Figure 31). 



Figures 33, 34, 35, 36 present (for the 1962 engineering 

A # 

graduate from the University of California) the solution for (W - W) 
in the expression 
n X..^(g, t) = c(W - W) - V 
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In the figures, the abscissa is t/T' and the right-hand 
ordinate is - W). 

The results shown in these figures will be utilized in the fol- 
lowing section where simulation will be made of an adaptive teaching 
situation using (g, t) data from an actual experiment with various 
decision rules, discount rates (r) and proportionality factors (c). 



SECTION VI 



SIMULATION OF AN ADAPTIVE 
DECISION STRUCTURE 

There is an unfortunate aspect to the type of adaptive system 
that has been described in the preceding sections: its validity can- 
not be tested directly. The method given above for specifying the 
output of an educational system either has face validity, or none at 
all. Also, the appropriateness of a decision rule cannot be tested 
directly, since the identical naive (or unlearned) students are not 
available again for testing with alternate decision rules. Even if 
matched groups of students are available, in order to compare 
various decision rules one must either abandon the central concept 
that educational experiments should be conducted so as to maximize 
S n , or else engage in the bootstrap operation of using a super- 
decision rule (up one rung in the ladder of levels of adaptivity) in 
order to find out which is the best decision rule (where the super- 
decision rule and the decision rule are likely to be one and the same). 

There is a third, vicarious, alternative: use data from 
educational experiments which have been previously conducted with- 
out benefit of the criterion of maximizing S ! . The procedure for 
using existing data would be approximately as follows: take a random 
sample of size one from each category; convert the data into X scores 
follow the specified decision rule in determining which category to 
take an observation from next; take a random sample of size one 
from this category, etc. This procedure could be repeated a number 
of times, and the distribution and expected value of for a given 
decision rule could be determined and compared with the distribution 
and expected value of for other decision rules. There is the 
further advantage that the results using the decision rule can be 
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compared with the results obtained in the experiment for which the 
data were originally collected. 

Since values of X have already been plotted for various 
combinations of g, t, and r for the engineering graduate of the 
University of California, it seemed most convenient to use data from 
an experiment conducted in an engineering school of the University 
of California. Furthermore, the experiment should have data on 
learning time and performance scores. Fortunately, the author had 
recently conducted an experiment which meets the above require- 
ments [48] . The purpose of the experiment had been to determine 
the effectiveness of different branching procedures for self- 
instructional material. The precise nature of the branching pro- 
cedure and the subject content for each category need not concern 
us for the simulation. However, it is of interest to note that a clear- 
cut decision could not be made in the original experiment as to which 
was the best category, since no category yielded the highest mean 
performance score and the lowest mean learning time. 



The appropriate model for this experiment is: 



* 



n X. , £ (g, t) = c(ft - W) - V 



b-a- 



tT 



_ T* (Jit) 
T 



[ w ( h(g ' >’ P( y* Cp P I(y^m) $A(y ' * m) ))j 

M(a+m-i+ £~jj 



tT. I 
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- ^ 
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JK CPlip+mT * •”»)] [ mM 
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where 



T = 4 years (7200 hours) 

t = 0 (the experiment was conducted during the first week 
of the freshman year) 
a = 18 (assumed) 
b = 62 (assumed) 
y = 1962 (assumed) 
y = 1958 

For D(t, j, &), the following estimates were obtained: 

D(t, 1) = $0,025 t + $1.00 
D(t, 2) = $0,025 t +$1.05 

Since the g in the experiment are given in percent and the g' 
shown in Figures 33, 34, 35, 36 are in letter grades, an h- 
transformation is required.. This transformation was obtained by 
matching the relative frequency of reported grades for University of 
California engineering graduates with the relative frequency of the 
percentage scores obtained in the experiment. Then, combining the 
h-transformation with the w-transformation given by the heavy lines 
in Figure 32, it was found that 



In the experiment, T* was estimated at 100 minutes. However 
it is of interest to discover the effect of a choice of M c n on the results; 
therefore values of T* = 50, 100, 200 minutes will be used in the 
simulation. Also, r = . 03, . 045, . 06 and . 10 will be tried. 

The original data and the calculated values of X(g, t) for the 
different T* and r combinations are shown in Table 6. Also shown 
for each combination are the Sn» 
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In the simulation, the assumption is made that the cost of 
measuring and recording (g, t) for each student is small compared to 
X(g,t); therefore, instead of treating these costs as separate 
quantities, they are included in the D(t, j , &). Four decision procedures 
are evaluated: Rule 1, Rule 7, the minimax rule, and the backwards - 
induction procedure. Furthermore, each of the procedures is used 
for two different total numbers of available students: one, corre- 
sponding to the number of students available from 7T^, and ir ^ in the 
original experiment, namely N = 58; two, corresponding to an 
assumed larger number of available students. In the simulation, the 
largest number of assumed available students that can be used is 
limited by the maximum N for which the backwards -induction proce- 
dure has been solved, namely N = 200. Since actual data are not 
available for N = 200 students, the assumption is made that the (g, t) 
measures on the students actually observed in the original experiment 
are representative of the distribution of such measures for each of 
the and that random selections from the sample population will be 

approximately equivalent to random selections from the it.. 

J 

Individual simulation runs were made with each of the decision 
procedures to check how the procedures behave in the particular 
rather than in the expected value sense. The results of these runs 
are shown in Appendix D. 

Expected values were obtained for each decision procedure by 
taking the average of 500 iterations of each problem situation. These 
results are shown below in Table 7 . 

Before examining the results of the simulation, attention 
should be called to the small differences between the means of it - and 
IT ^ shown in Table 6. These differences are approximately 0.2 stand- 
ard deviations, and therefore different decision rules will not yield 
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vastly different results. For example* for r-0.045 and T 100, 
Table 6 reveals that X ■ 6. 60 and X 2 =7. 25, and the decision rule 
must yield an E(S n /n) somewhere between the two means. One 
measure of the effectiveness of a decision rule is given by 

E <S n /n)-M 1 



(assuming /l* 2 > A* 1 ) . Using the means given in Table 6 as approxima- 
tions of the m. Table 7 reveals that for r=0. 045, T' =100, and N=200, 
the Eff of Rule 1 is 42%, the Eff of Rule 7 is 75%, the Eff of the 
Minimax Rule is 82%, and the Eff of the Backwards -Induction Rule 
is 85%. However, the absolute net expected value, in dollars per 
student, for the sub-unit of education investigated in the simulation, 
diffsrs very little from one decision rule to another. For the 
r=0. 045, T' =100 and N=200, these absolute values range from a 
minimum $6. 85 for Rule 1 to a maximum of $7 . 15 for the Backwards- 
Induction Rule, a difference of less than 5%! Since the basic data 
used for obtaining the utilities are likely to have errors greater than 
±5% it appears that when the differences between the true means of 
the ir. are small, the choice of the most effective decision rule will 
not give interestingly better results than the choice of a less effective 
decision rule. The above statement applies when the r, T' , and N 
have been precisely determined. 

Table 7 reveals that the choice of r and T' has a much 
greater effect on the absolute values of E(S n /n) than does the choice 
of a decision rule, and therefore any systems analysis which requires 
the use of the absolute value attributable to a given unit or sub-unit 
of education will be greatly affected by the choice of r and T' . For 
example, if the administrator of an educational system observed 
that the E(S n /n) was approximately $7. 00 per student when comparing 






two different teaching methods, and having assumed an r-0. 045 and a 
T* *100, he would probably be inclined to continue the sequential 
assignment of students to the two methods. However, if an r=0. 10 
and a T* =100 had been assumed, he may be confronted with an 
E(S n / n ) of -$3.00, indicating the inputs outweigh the expected returns, 
in which case he would probably want to stop assigning students to the 
two methods and probably consider new alternatives. 

To conclude. Table 7 indicates that for moderate to large N, 
the Backwards -Induction Rule yields better results than any of the 
other rules considered, regardless of the choice of r and T* . There- 
fore, this rule is recommended for use in adaptive educational 
systems, particularly since the value of r and T* would be fixed for 
all students involved in a given sequential assignment problem. 



SECTION VII 
CONCLUSION 



Education is generally conceded to be a wealth or a utility 
producing process. It is also a process which traditionally has been 
shaped by intuitive rather than by analytical decisions. In the preced- 
ing sections, an attempt was made to show how an analytical adaptive 
decision structure can be built for educational systems. It was 
emphasized that such a structure rests on four cornerstones: a plan 
for gathering and using data; an explicit criterion function; a set of 
decision rules for achieving the criterion; and a utility function which 
relates system inputs and system outputs to a value scale outside of 
the system. 

The utility function developed in the preceding sections 
defines the output of an educational system as the increment in life- 
cycle productive output attributable to the educational experience for 
all individuals who have been part of the system. An approximate 
measure of the average increment in productive output can be obtained 
by comparing the earnings of two matched groups of individuals, one 
of which has had the educational experience, the other of which has not. 
Such comparisons are relatively precise for large blocks of education, 
such as a college education versus no college education, and is less 
precise for smaller units of education, such as a semester course in 
a specific subject. The trend, over a number of past years, of the 
average increment in earnings of previous students is used to project 
the future expected increment in earnings of current students. For 
some educational experiences, such as the college training of profes- 
sional engineers, a correlation can be found between performance in 
school and subsequent life-cycle earnings. In these special cases, 
the expected increment in life -cycle earnings of a current student can 
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be adjusted by a school performance factor . The expected increment 
in earnings is distributed over the productive life -cycle, a span of 
perhaps forty to forty-five years. By discounting the future expected 
earnings, a single present worth of the entire expected increment in 
life -cycle earnings can be obtained. Similarly, a present worth of the 
total expenditures made in providing a student with an educational 
experience can be obtained. The difference between the two present 
worths represents the present worth of the net expected output per 
student of the system. By discounting the expected increment in 
earnings for each year of the productive life-cycle back to the date on 
which a student entered the educational system, an economic value can 
be associated with the amount of time it takes a student to complete 
the educational experience. All other things being equal, a student 
who completes a unit of education in three years would have each of 
the annual expected increments in earnings discounted one year less 
than if he completed the unit of education in four years. Using this 
time-value factor, and the school performance factor, it is possible 
for the first time to evaluate the possible trade-off between the 
student* s learning time and performance level. Discounting expected 
earnings also reduces the effect of the uncertainties and errors that 
enter into the projection of future earnings. 

The utility function is stated in sufficiently general terms so 
that the present worth of the expected increment in life -cycle produc- 
tive output need not be measured only in terms of earnings. It is 
conceivable that adequate ec< iic measures can be found for such 
things as the expected increment in national and individual security 
attributable to an educational experience, or in the indirect contribu- 
tions to the well-being of other individuals (say, from research 
discoveries), or for such indirect benefits as a pleasant work environ- 
ment, longer vacations, a healthier life and other currently 



non-monetary benefits that may be attributable to an educational 
experience. Evaluation of the non-monetary measures becomes more 
important as the emphasis in a society shifts from monetary to non- 
monetary rewards for productive output (partly as a result of differenc 
tax rates for low and high earners). 

Having established a plan for making an economic measure of 
the net output of an educational system, and having illustrated its use 
for University of California engineering students, the next important 
consideration is to establish an overall goal or criterion of perform- 
ance for the system. The criterion that has been suggested here is 
that an educational system should operate so as to maximize the sum 
of the increment in the net present worth of the expected life -cycle 
productive output of all of the students who are being educated in the 
system. If one has prior knowledge of the costs and the expected 
gross outputs associated with different curricula or pedagogical 
techniques, then a straightforward input -output analysis can be made 
and that curricular configuration or those pedagogical techniques 
employed which will yield the maximum sum of the expected net out- 
puts. One example where the costs and expected gross outputs could 
be readily anticipated is in a comparison of two -semester four-year 
college systems versus three -semester three -year college systems. 
However, in most situations of interest, accurate prior knowledge of 
the costs and expected gross outputs for different curricular or 
pedagogical techniques is not available. Therefore, some exploration 
or information gathering is necessary. If such exploration consists 
in trying different teaching methods or course content, then some 
students will be exposed to methods or content which may be inferior 
to other methods or curricular content, in that they yield lower 
present worths of the net increment in expected life -cycle productive 
output for those students. There is a trade-off between the probable 



loss attributable to assigning some students to inferior regimens 
during the information gathering phase, and the probable loss attri- 
butable to the failure to gather enough information as to which would 
be the best regimen for all future students. Therefore, decision 
rules are needed for assigning students to available curricular 
configurations or pedagogical methods in such a way as to meet the 
criterion of maximizing the sum of the net output of all students going 
through the system . 

A number of possible decision rules have been examined in 
the preceding sections. For the case where no prior information 
exists as to the distribution of expected net outputs, some qualitative 
results have been obtained for specifying the set of "forced choices" 
first suggested by Robbins [10] in his statement of the sequential 
assignment problem. For the case where the distribution of expected 
net outputs is known to be normally distributed, a method has been 
developed here for including the cost of making observations on 
student performance during the information gathering period in a two- 
stage sequential decision procedure. Of most interest was the 
development in Section III of a multi-stage or continuous sequential 
decision rule for use with normally distributed expected net outputs. 
Since records are ordinarily kept on all students in an educational 
system, and not only on the first group of students who are assigned 
to specific curriculum, the multi-stage sequential assignment proce- 
dure is most appropriate. Where records on student performance 
are a necessary part of the system for reasons other than their use 
in a decision process, or where the cost of obtaining such records is 
very small compared to the net output, then the multi-stage sequen- 
tial decision process gives better results than any other process. 

The solution to the multi-stage sequential assignment problem was 



accomplished by a backwards -induction, using numerical techniques 
to solve the multiple integrals that arise in the problem. 

In the course of developing the framework for the adaptive 
decision structure for educational systems, a number of points arose 
which seem to warrant further investigation in order to improve the 
structure or extend its usefulness. First, in its current state of 
development, the multi-stage sequential assignment problem requires 
a separate set of calculations for each different estimated number, N, 
of students who will be going through a specified educational experi- 
ence. For very large N, such computations can be excessively time 
consuming, ever* on the fastest available digital computer. Overall 
computation time could be reduced if the solution is carried out on a 
hybrid analog -digital computer. 

Another fruitful avenue of investigation is to try to find a 
general solution in terms of N. Since the solution for different N 
results in surfaces which appear to have some regular features, such 
a general solution seems feasible. 

The multi-stage sequential assignment problem has only been 
solved here for the case where the distributions are Gaussian and 
where the ratios of the variances are known. The solution can be 
further extended to include the case where the ratios of the variances 
are not known, and also to non-Gaussian distributions. However, it is 
felt that such extension will be of more interest in adaptive decision 
problems which arise outside of the context of the educational systems 
that were considered here. 

Seqond, the utility function developed in Sections IV and V can 
be considerably enhanced by: 

a. Careful studies to reveal those factors (in addition to 

school grades) which can be measured either before 



or while a student is engaged in an educational experi- 
ence and which correlate with subsequent life -cycle 
productive output level. 

b. More specific data on the life -cycle productive outputs 
of carefully matched "educated" and "non-educated" 
groups. 

c. A means for including non-monetary indications of 
productive output. 

d. The development of school performance measures which 
use an absolute scale, rather than such relative scales 
as obtained from the familiar bell- shaped curve. Some 
states have Regents* examinations and some professional 
schools have terminal examinations which are steps in 
the desired direction. 

e. The accumulation of data on life -cycle productive outputs 
of students who have been exposed to different combina- 
tions of sub-units of a given educational program or have 
been exposed to different pedagogical procedures. 

Even though the additional research outlined above would 
enhance the usefulness of the decision structure, it is possible to use 
the existing framework for some significant input-output analyses of 
educational systems, and it is also possible to inaugurate an adaptive 
decision procedure in some specific cases, such as in engineering 
education. Within the framework of the adaptive structure, it should 
be possible to make rational decisions on the amount of resources to 
allocate to the development of instructional material and on techniques 
that would permit a gradual shift from the lock- step grouping of 
students in semester length courses to a flexible scheduling scheme in 
which each student would progress through an eduational program as 



fast as possible, consistent with his own needs and the needs of the 
world in which he will some day become a productive member. 
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APPENDIX A 
SAKI 



A student using "SAKI" views an exercise line consisting of 
alpha-numeric characters which are illuminated one at a time, each 
for a different length of time. Simultaneously, the student attempts 
to replicate the characters by depressing the keys on a key-punch 
machine. A separately illuminated display of the keyboard layout 
indicates to the student the correct key to depress at the same time 
that a particular exercise character is being illuminated. This help- 
ful information may be withheld, either completely or partially. If 
completely withheld, the keyboard layout display lamps are not illu- 
minated; if partially withheld, these lamps are illuminated after a 
delay period, i. e. , some milliseconds after the exercise character 
has been illuminated. If the subscript "j" identifies a particular 
exercise line (4 lines used in Saki), and the subscript "i" identifies 
the position of a character on a line (24 positions), then T„ repre- 
sents the interval of time allowed for illuminating the i-th character 
on the j-th exercise line, and E represents the delay time for illu- 
minating the corresponding character in the helpful keyboard layout 
display. The symbols given here are those used by Pask. 

According to Pask, a measure, S..(t), (temporarily stored in 
the device as a potential) is obtained by; 

a. Determining whether the response is correct or incorrect. 
Incorrect responses are arbitrarily assigned a S^.(t) value 
of minus one. 

b. For correct responses, Sj.(t) is the difference between 
the time allowed for illuminating the ji-th character on 
the exercise line and the actual response time. 
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c S (t) will have an initial value of zero and a value of one at 

ji 

the end of the training process. 

d. 1 £ S..(t) > 0. (A requirement which appears to contradict 

a and c . ) 

Furthermore, an average value of the quantities S^t), called 
0, is obtained. 0 would therefore have an initial value of zero and a 
final value of one . 

A storage condenser is provided for each ji character and the 
potential at any instant on a condenser may be designated by an .,«> 
value. Initially, a charge of value M u M is placed on each condenser. 

If no move is made, or until a move is made for each ji-th character, 
the condenser is discharged exponentially through a high resistance. 

If a move is made, the condenser is charged through a resistance for 
a fixed time, t' , by a potential, S^(t). At the end of the training 
process the a^(t) should all have a value of one. 

To recapitulate, 

{ T. (t-1) - T..(t) for correct response 
3i 3i 

- 1 for incorrect response 

where t is response time 

> s..(t) > 0 (probably true only for correct response) 
f 

0 = avg S..(t) over all t 

^T.^(t) K (m + 0) < a jj) + u; 0 < u - 1, 0<m^l 

*E = i/(a^.); 0 < */ - 1 



^Inferred from verbal descriptions 
■^Explicitly defined by Pask 
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a..(t) = a (t-1) exp 
31 3 i 



[- R^cJ + ( a ji (t_1) eXP L' Vi ] 



+ S..<t)jexp[-^] 



However, for fixed t' 

a..(t) = fa.. (t-1) exp 



L' r i c i \ 



|(1 +k) + K S.^t) 



where m, u, 1 /, R, C, K and (S = - 1) are all arbitrary constants. 

The initial values are: 

S..(0) = 0 
3i 

0 ( 0 ) = 0 

a (0) = u ^ 1 



T..(0) = (m +0) (a.j) +u 
J1 J 

E(0) = i/(a...) = v u 



= u(m + 1) 



The final values are: 

0(f) - 1 
a. .(f) - 1 

31 

As practice occurs: 

i T^.(t) should diminish 

ii E(t) should increase 

Assume that the correct responses are made to the first t characters, 
with T.^t) = T„(t-1) in each case. Then: 

S..(t) = 0 
3i 



t Inferred from verbal descriptions 



a..(t) < a jt (o) 
i' T..(t) < T..(o) 

JA J A 

ii' and E(t) < E(o) 



If now, at t + 1, T..(t+ 1) < T..(t-1), i.e., a response is made in less 

3 1 3 1 

than the allowed time, then: 



S..(t+1)> 0> S..(t) 
31 3 i 

(t+1) > 0 > (t) 

a..(t+l)> a.,(t) 

31 3 i 

i" T..(t+1) > T . . (t) 

3i 3i 

ii" and E(t+1) > E(t) 



But note that ii* violates condition ii, and i" violates condition i. 
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TABLE C-l 

EARNINGS OF ENGINEERS 
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FOOTNOTES FOR TABLE C-l 



Source: Employment and Earnings in the Engineering Profes- 
sion 1929-34. U. S. Dept, of Labor, Bureau of Labor Statistics, 
Bulletin No. 682, 1941. Table 64. Data in the above study 
were collected by mail questionnaires. 52, 829 returns were 
used in the analysis. The sample was assumed to be repre- 
sentative of all engineers in the U. S. The original figures 
reflected "monthly earned median income from engineering 
work for time actually employed". These have been converted 
here to yearly figures. Some of the figures on Table B-l are 
shown bounded by an upper and a lower dash. These upper and 
lower dashes indicate the range of the grouping of years of 
experience in the original report. For example, the fourth 
entry in the first column, 1929, is 3144, and it has an upper 
dash at year 5 and a lower dash at year 8. Thus the figure 
3144 is the median of the group with 5 to 8 years of experience. 
The last figure in the columns is the salary for that year of 
experience and beyond. 

Source: Employment Outlook for Engineers. U. S. Dept, of 
Labor, Bureau of Labor Statistics, Bulletin No. 968, 1949, 
Table D-13. Data were collected by mail questionnaires. The 
sample was assumed to be representative of all U. S. engineers. 
The figures available in this report are for median base 
monthly salaries for the different engineering specialties, by 
years of experience. To obtain a composite figure for all 
engineers, the different specialties were summed across at 
each year level of experience and the average obtained. Since 
the proportion of the different specialties in the total sample 
was not the same, an attempt was made to weigh the different 
specialties proportionately in obtaining the composite figure. 
For the survey year, 1946, both weighted and unweighted 
composite figures were found and plotted on a graph. The 
curves were practically the same . Hence only the unweighted 
composite figures were calculated and these converted into 
yearly earnings . 

Source: Professional Income of Engineers, 1960. Engineers 
Joint C ovine il. New York. Page 13. Data were collected by 
mail questionnaires. The sample was assumed to be repre- 
sentative of all U.S. engineers. Figures reflect '*median 
annual base salary including cost of living allowance and bonus 
if considered part of salary". Figures in the original report 
were listed by years since B. S. degree. Here it is assumed 
that the year of completion of B. S. degree was the year of 



entry into work. Beyond the 10th year of experience salary 
figures are listed every five years. These are for terminal 
years and not for grouped years as in the case of (a) and \b) 

above . 

Source: Professional Income of Engineers, 1962. Engineers 
Joint Council, New York. Page 15. Same comments as for (c) 



above . 
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For the years 1908-1912, the cost of living index (Federal Reserve Bank of N. Y. ) with base year 
1913*100 was adjusted to agree with the 1947-49=100 base. 

Early 






e* lewfKSMiiij .TL3£a*i?s!; -a,-* k . 



TABLE C-3 

SURVIVAL FACTORS 



166 





Professional 

Workers 


Skilled 

Workers 


(a) 


(b) (c) 

Deaths Survival 


(b) (c) 

Deaths Survival 


Age 


per 100, 000 Factor 


per 100, 000 Factor 


18 


95 .99853 


138 .99862 


19 


95 .99694 


138 .99724 


20 


95 .99599 


139 . 99585 


21 


96 . 99503 


139 . 99446 


22 


96 . 99407 


140 .99306 


23 


95 .99312 


140 .99166 


24 


95 .99217 


141 .99025 


25 


94 .99123 


142 . 98883 


26 


93 . 99030 


143 . 98740 


27 


93 .98937 


144 . 98596 


28 


95 . 98842 


148 . 98448 


29 


104 . 98738 


157 .98291 


30 


114 .98624 


167 .98124 


31 


125 .98499 


176 . 97948 


32 


135 . 98364 


184 .97764 


33 


145 .98219 


200 . 97564 


34 


155 . 98064 


225 . 97339 


35 


175 • • 97889 


250 . 97089 


36 


200 .97689 


275 . 96814 


37 


220 . 97469 


300 .96514 


38 


250 .97219 


325 .96189 


39 


275 .96944 


375 . 95814 


40 


300 . 96644 


415 .95399 


41 


350 .96294 


450 . 94949 


42 


400 . 95894 


500 . 94449 


43 


450 . 95444 


550 .93899 


44 


525 .94919 


625 .93274 


45 


575 . 94344 


675 .92599 


46 


650 . 93694 


750 .91849 


47 


725 .92969 


825 . 91024 


48 


800 .92169 


925 .90099 


49 


980 .91189 


1000 . 89099 


50 


1000 . 90189 


1100 .87999 


51 


1100 .89089 


1225 . 86774 


52 


1225 . 87864 


1425 . 85349 


53 


1350 .86514 


1500 . 83849 


54 


1475 . 85039 


1650 .82199 


55 


1600 .83439 


1775 . 80424 


56 


1750 .81689 


1925 .78499 


57 


1922 .79767 


2081 .76418 


58 


2075 .77692 


2250 .74164 


59 


2225 .75467 


2450 .71718 


60 


2425 .73042 


2650 .69069 


61 


2650 .70392 


2900 .66168 


62 


2886 .67506 


3137 .63031 



(b) 



100. 000 



(c) = 



-t 

18 



(b) 



Source: Inter- and extra-polated from Table 2 in Monyama, I. M. and Guralnick. 
Occupational and Social Clan Differences in Mortality. Trends and Differentials 
in Mortality. Milbank Memorial Fund, New York, 1956. 
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TABLE 04 

EARNINGS OF CRAFTSM EN, FOREMEN 
AND KINDRED WORKERS 



Age 


1946 


1949 


14-24 


- 




25-34 


2202* 




35-44 


2629* 


• 


45-54 


2753* 




55-64 


2456* 




over-all 

median 


2433 h 


3114 d 



1961 1955 1959 1961 

2684* 

3592 b 

3913 b 

3731 b 

3544 bc 

3627 b 4423 e 5355* 56406 



* From unpublished data. Bureau of the Census, U.S. Dept, of 
Commerce. 

a From Miller, H. P., The Income of the American People . 
Wiley, 1955 (Table 25, page 54). 

b Bureau of the Census, Current Population Reports, 

Consumer Income, Series P-60, No. 11 (Table B). 

c For age group 55 and beyond. 

d Bureau of the Census, Current Population Reports, 

Consumer Income, Series P-60, No. 7 (Table 19). 

e Bureau of the Census, Current Population Reports, 

Consumer Income, Series P-60, No. 23 (Table 5). 

f Bureau of the Census, Current Population Reports, 

Consumer Income, Series P-60, No. 35 (Table 25). 

g Bureau of the Census, Current Population Reports, 

Consumer Income, Series P-60, No. 39 (Table 29). 

h Bureau of the Census, Current Population Reports, 

Consumer Income, Series P-60, No. 3 (Table 16). 



TABLE 05 

EARNINGS OF ALL U. S. MALES 



Age 


1946 a 


1949 b 


1951 c 


1955 d 


1959 e 


1961 f 


14-19 


406 


410 


434 


416 


411 


399 


20-24 


1247 


1726 


2259 


2223 


2612 


2654 


25-34 


2098 


2754 


3288 


3886 


4774 


5045 


35-44 


2535 


2951 


3617 


4255 


5320 


5726 


45-54 


2575 


2751 


3280 


4138 


4852 


5321 


55-64 


2285 


2366 


2840 


3440 


4190 


4597 


65 + 


1625 


1016 


1008 


1337 


1576 


1758 


over-all 

median 


2134 


2346 


2952 


3354 


3996 


4189 



Source : Bureau of the Census* Current Population Reports* 
Consumer Income* Series P-60, Nos. (for a) 3, 
Table 10; (for b) 35, Table G; (for c) 11, Table 3; 
(for d) 23, Table 3; (for e) 35, Table 23; (for f) 39, 
Table 25. 
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TABLE D-la 

SIMULATION WITH RULE 1 - FIRST RUN 



^■ 6.60 



M 2 - 7. 25 



n 




X . 
n j 


S /n 
n 


n 5 l 


A 


i 


1 


5.98 


5.98 


5.98 


0.00 


2 


2 


9.04 


7.51 


5.98 


9.04 


S 


2 


8.84 


7.96 


5.98 


8.94 


4 


2 


- 0.02 


5.96 


5.98 


5.95 


5 


1 


3.31 


5.43 


4.65 


5.95 


6 


2 


6.12 


5.55 


4.65 


6.00 


7 


2 


9.01 


6.04 


4.65 


6.60 


8 


2 


5.92 


6.03 


4.65 


6.49 


9 


2 


8.92 


6.35 


4.65 


6.84 


10 


2 


6.22 


6.34 


4.65 


6.76 


11 


2 


6.07 


6.31 


4.65 


6.68 


12 


2 


8.80 


6.52 


4.65 


6.90 


13 


2 


5.83 


6.47 


4.65 


6.80 


14 


a 


3.20 


6 . 24 


4.65 


6.50 


15 


2 


6.02 


6 . 22 


4.65 


6.46 


16 


2 


8.97 


6.39 


4.65 


6.64 


17 


2 


5.62 


6.35 


4.65 


6.57 


18 


2 


11.87 


6.66 


4.65 


6.91 


13 


2 


11.81 


6.93 


4.65 


7.19 


20 


2 


6.00 


6.88 


4.65 


7.13 


21 


2 


5.98 


6.84 


4.65 


7.07 


22 


2 


6.07 


6.80 


4.65 


7.02 


23 


2 


8.86 


6.89 


4.65 


7.11 


24 


2 


0.39 


6.62 


4.65 


6.80 


25 


2 


6.04 


6.60 


4.65 


6.77 


26 


2 


11.82 


6.80 


4.65 


6.98 


27 


2 


6.03 


6.77 


4.65 


6.94 


28 


2 


15.13 


7.07 


4.65 


7.26 


29 


2 


3.20 


6.94 


4.65 


7.11 


30 


2 


3.20 


6.81 


4.65 


6.97 


31 


2 


11.87 


6.98 


4.65 


7.14 


32 


2 


8.84 


7.03 


4.65 


7.19 


33 


2 


6.22 


7.01 


4.65 


7.16 


34 


2 


6.22 


6.99 


4.65 


7.13 


35 


2 


5.83 


6.95 


4.65 


7.09 


36 


2 


11.82 


7.09 


4.65 


7.23 


37 


2 


5.98 


7.06 


4.65 


7.20 


38 


2 


6.12 


7.03 


4.65 


7.17 


39 


2 


6.07 


7.01 


4.65 


7.14 


40 


2 


6.04 


6.99 


4.65 


7.11 


41 


2 


8.92 


7.03 


4.65 


7.16 


42 


2 


6.12 


7.01 


4.65 


7.13 


43 


2 


5.92 


6.99 


4.65 


7.10 


44 


2 


6.22 


6.97 


4.65 


7.08 


45 


2 


6.04 


6.95 


4.65 


7.06 


46 


2 


9.04 


6 . 99 


4.65 


7.10 


47 


2 


6.22 


6.98 


4.65 


7.08 


48 


2 


6.22 


6 . 96 


4.65 


7.06 


49 


2 


9.01 


7.00 


4.65 


7.10 


50 


2 


6.00 


6.98 


4.65 


7.08 


51 


2 


6.00 


6 . 96 


4.65 


7.06 


52 


2 


6.04 


6.95 


4.65 


7.04 


53 


2 


0.39 


6.82 


4.65 


6.91 


54 


2 


6.22 


6.81 


4.65 


6.90 


55 


2 


- 0.02 


6.69 


4.65 


6.77 


56 


2 


6.00 


6.63 


4.65 


6.75 


57 


2 


11.87 


6.77 


4.65 


6.84 


58 


2 


6.07 


6.76 


4.65 


6.83 






TABLE D-lb 

SIMULATION WITH RULE 1 - SECOND RUN 



fij’6. 60 m 2 * 7 - 25 




cooo-J<acntftcoco 
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TABLE D-lc 

SIMULATION WITH RULE 1 - THIRD RUN 



^■ 6.60 



P 2 “7.25 



n 



1 



10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 

52 

53 

54 

55 

56 

57 
53 



1 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 



6 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 



3.09 

6.12 

8.92 
8.80 
5.62 

9.04 
8.84 

- 0.02 

6.04 

9.01 

5.92 
11.82 

6.22 

6.07 

6.03 

5.83 

3.20 

6.02 

8.97 
15.13 
11.87 
11.81 

6.00 

5.98 
6.07 
8.86 
0.39 
6.00 
3.20 
9.01 
5.62 
5.62 
6.22 
6.07 
6.07 
6.07 

5.83 
3.20 
6.07 
8.80 

8.97 

6.22 

9.01 

6.12 

8.84 

5.98 
5.98 

6.02 

8.92 

11.87 

5.98 

6.00 

11.87 

9.01 

8.80 

8.97 

3.20 

- 0.02 



Sjn 

n 




3.09 

4.61 
6.05 
6.74 
6.51 
6.94 
7.21 

6.30 
6.28 
6.55 
6.49 
6.94 
6.88 
6.82 
6.77 
6.71 
6.51 
6.48 

6.61 
7.04 
7.27 
7.47 
7.41 

7.35 

7.30 

7.36 
7.10 
7.06 
6.93 
7.00 
6.96 

6.91 
6.89 
6.87 

6.85 
6.82 
6.80 
6.70 
6.69 
6.74 
6.79 
6.78 

6.83 

6.82 

6.86 

6.84 
6.82 
6.81 

6.85 
6.95 
6.93 

6.92 
7.01 
7.05 
7.08 
7.11 
7.04 
6.92 



3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 

3.09 



0. 00 
6.12 
7.52 

7.95 
7.37 
7.70 
7.89 
6.76 

6.67 
6.93 
6.83 
7.29 
7.20 
7.11 

7.03 

6.95 
6.72 

6.68 

6.81 

7.25 
7.48 
7.68 
7.61 
7.54 
7.48 
7.53 

7.26 
7.21 

7.07 
7.13 

7.08 

7.04 
7.01 

6.98 

6.96 
6.93 
6.90 
6.80 
6.78 
6.83 

6.89 
6.87 

6.92 

6.90 
6.95 
6.^3 

6.91 
6.89 

6.93 
7.03 
7.01 

6.99 
7.08 
7.12 
7.15 
7.18 
7.11 
6.99 



1 









TABLE D-2a 

gTMTTLATION WITH RUL E 7 - FIRST RUN 



rfjS 3,10,28 
M4,38 



Mj"6. 50 




M 2 -7.25 

»*2 



1 

2 

3 

4 

5 

6 

7 

8 

9 

10 
11 
12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 

52 

53 

54 

55 

56 

57 

58 



1 

2 

2 

2 

1 

2 

2 

2 

2 

2 

2 

2 

2 

1 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 



2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

1 

1 



3.09 
8.84 
5.98 
9.01 
5.97 
5.83 
6.07 

5.92 

6.12 

8.92 
8.80 
5.62 

9.04 
8.81 

11.81 

- 0.02 

6.04 
6.00 
0.39 

11.82 

6.22 

6.07 

6.03 

8.86 

3.20 

6.02 

8.97 

15.13 

11.87 

3.20 

- 0.02 

8.86 

- 0.02 

5.83 

9.01 

11.82 

5.78 

6.25 

5.67 

6.12 

15.13 

8.97 

3.20 

0.39 

6.07 

8.92 

5.83 

6.03 

6.04 
11.81 

8.92 

8.92 

9.04 

6.02 

- 0.02 

0.38 

8.87 

0.54 



3.09 
5.97 
5.97 
6.73 
6.58 
6.46 
6.40 
6.34 

6.32 
6.58 
6.78 
6.69 
6.87 
7.01 

7.33 
6.87 
6.82 
6.77 
6.44 

6.71 
6.68 
6.86 
6.63 

6.72 
8.58 
6.56 
6.65 
6.95 
7.12 
6.99 

6.77 
6.83 
6.62 
6.60 
6.67 
6.81 

6.78 

6.77 

6.74 
6.73 
6.93 
6.98 
6.89 

6.75 

6.73 

6.78 

6.76 

6.74 
6.73 
6.83 
6.87 
6.91 
6.95 
6.93 
6.81 
6.69 
6.73 
6.62 



3.09 

3.09 

3.09 

8.09 
4.53 
4.53 
4.53 
4.53 
4.53 
4.53 
4.53 
4.53 
4.53 
5.96 
5.96 
5.96 
5.96 
5.96 
5.96 
5.96 
5.96 
5.96 
5.96 
5.96 
5.96 
5.96 
5.96 
5.96 
5.96 
5.96 
5.96 
5.96 
5.96 
5.96 

5.96 
7.43 
7.10 

6.96 
6.77 
6.69 
6.69 
6.69 
6.69 
6.69 
6.69 
6.69 
6.69 
6.69 
6.69 
6.69 
6.69 
6.69 
6.69 
6.69 
6.69 
6.69 
6.94 
6.30 



0.00 

8.84 

7.41 
7.95 

7.95 

7.42 
7.15 

6.95 
6.83 
7.09 
7.28 
7.12 
7.29 
7.29 
7.67 
7.08 
7.00 
6.94 
6.53 
8.84 
6.80 
6.77 

6.73 
6.83 
6.67 
6.64 

6.74 
7.07 
7.26 
7.11 
6.85 
6.02 
6.69 
6.66 
6.74 
6.74 
6.74 
6.74 
6.74 
6.74 
6.99 
7.05 
6.94 

6.76 

6.74 
6.80 

6.77 

6.75 
6.74 
6.86 
6.90 
6.95 
7.00 
6.98 
6.83 
6.69 
6.69 
6.69 
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TABLE D-2b 

SIMULATION WITH RULE 7 - SECOND RUN 



n 



1 

2 

3 

4 

5 

6 

7 

8 

9 

10 
11 
12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 

52 

53 

54 

55 

56 

57 

58 



r IT -: 3 # 10 # 28 
SCtB ^ ff 2 : 5 ' 14 ' 36 



*1 


X . 
n j 


V” 


0 

n*l 


M 2 » 7.25 


1 


8.81 


8.81 


8.81 


0.00 


2 


9.01 


8 . 91 


8.81 


9.01 


2 


5.83 


7.89 


8.81 


7.42 


1 


11.82 


8.87 


10.32 


7.42 


1 


5.78 


8.25 


8.81 


7.42 


1 


6.25 


7.92 


8.17 


7.42 


1 


3.08 


7.23 


7.15 


7.42 


2 


6.07 


7.09 


7.15 


6.97 


1 


6.12 


6.98 


6.98 


6.97 


2 


5.92 


6.87 


6.98 


6.71 


1 


8.87 


7.06 


7.25 


6.71 


1 


0.54 


6.51 


6.41 


6.71 


2 


6.12 


6.48 


6.41 


6.59 


1 


0.98 


6.45 


6.37 


6.59 


2 


8.92 


6.61 


6.37 


6.98 


2 


8.80 


6.75 


6.37 


7.24 


2 


5.62 


6.68 


6.37 


7.04 


2 


9.04 


6.82 


6.37 


7.26 


2 


8.84 


6.92 


6.37 


7.42 


2 


- 0.02 


6.57 


6.37 


6.74 


2 


6.04 


6.55 


6.37 


6.69 


2 


5.98 


6.52 


6.37 


6.63 


2 


0.39 


6.26 


6.37 


6.19 


1 


3.31 


6.13 


6.06 


6.19 


2 


11.82 


6.36 


6.06 


6.56 


2 


6.22 


6.36 


6.06 


6.54 


2 


6.07 


6.35 


6.06 


6.51 


2 


6.03 


6.34 


6.06 


6.49 


2 


8.86 


6.42 


6.06 


6.61 


2 


3.20 


6.32 


6.06 


6.44 


2 


6.02 


6.31 


6.06 


6.42 


2 


* 8.97 


6.39 


6.06 


6.54 


2 


15.13 


6.65 


6.06 


6.91 


2 


11.87 


6.81 


6.06 


7.12 


2 


11.81 


6 . 95 


6.06 


7.31 


1 


8.86 


7.00 


6.32 


7.31 


2 


6.00 


6.98 


6.32 


7.26 


2 


3.20 


6.88 


6.32 


7.11 


2 


11.82 


7.00 


6.32 


7.28 


2 


0.39 


6.84 


6.32 


7.04 


2 


6.22 


6.82 


6.32 


7.01 


2 


6.12 


6.81 


6.32 


6.98 


2 


11.82 


6.92 


6.32 


7.13 


2 


6.00 


6.90 


6.32 


7.10 


2 


11.87 


7.01 


6.32 


7.24 


2 


5.92 


6.99 


6.32 


7.20 


2 


8.97 


7.03 


6.32 


7.25 


2 


6.03 


7.01 


6.32 


7.22 


2 


6.00 


6.99 


6.32 


7.19 


2 


8.80 


7.03 


6.32 


7.23 


2 


5.83 


7.00 


6.32 


7.19 


2 


6.00 


6.99 


6.32 


7.16 


2 


11.87 


7.08 


6.32 


7.28 


2 


3.20 


7.01 


6.32 


7.18 


2 


8.84 


7.04 


6.32 


7.22 


2 


6.03 


7.02 


6.32 


7.19 


2 


11.82 


7.11 


6.32 


7.29 


2 


6.22 


7.09 


6.32 


7.27 
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TABLE D-2c 

SIMULATION WITH RULE 7 - THIRD RUN 



n 



1 

2 

3 

4 

5 
0 

7 

8 

9 

10 
11 
12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 

52 

53 

54 

55 

56 

57 

58 



Set B 




1 

2 

2 

2 

1 

1 

2 

2 

2 

2 

2 

2 

2 

1 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

1 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 



3 , 10 , 28 



5 , 14 , 36 



n X i 


S/n 

n 


5.78 


5.78 


5.83 


5.81 


5.82 


5.65 


6.12 


5.92 


6.25 


5.98 


3.09 


5.50 


8.82 


5.99 


8.80 


6.34 


5.62 


6.26 


9.04 


6.54 


8.84 


6.75 


- 0. 02 


6.19 


6.04 


6.18 


6.12 


6.17 


9.01 


6.36 


0.39 


5.99 


11.82 


6.33 


6.22 


6.33 


6.07 


6.31 


6.03 


6.30 


8.86 


6.42 


3.20 


6.27 


6.02 


6.26 


8.97 


6,38 


15.13 


6.73 


11.87 


6.93 


11.81 


7.11 


6.00 


7.07 


5.98 


7.03 


6.07 


7.00 


11.87 


7.16 


6.00 


7.12 


- 0. 02 


6.90 


8.92 


6.96 


5.83 


6.93 


8.87 


6.98 


5.98 


6.96 


6.00 


6.93 


6.04 


6.91 


6.02 


6.89 


5.62 


6.86 


5.83 


6.83 


6.04 


6.81 


8.80 


6.86 


6.03 


6.84 


9.01 


6.89 


8.80 


6.93 


5.62 


6.90 


8.97 


6.94 


11.82 


7.04 


3.20 


6.97 


0.39 


6.84 


8.80 


6.88 


6.00 


6.86 


5.98 


6.84 


6.04 


6.83 


8.80 


6.87 


9.04 


6.90 



*^* 6.60 

5 L 

n l 


M 2 » 7. 25 
n X 2 


5.78 


0. 00 


5.78 


5.83 


5.78 


5.88 


5.78 


5.96 


6.02 


5.96 


5.04 


5.96 


5. 04 


6.70 


5,04 


7.12 


5.04 


6.87 


5.04 


7.18 


5.04 


7.39 


5.04 


6.57 


5.04 


6.51 


5.31 


6.51 


5.31 


6.74 


5.31 


6.21 


5.31 


6.64 


5.31 


6.61 


5.31 


6.58 


5.31 


6.54 


5.31 


6.68 


5.31 


6.49 


5.31 


6.46 


5.31 


6.59 


5.31 


7.00 


5.31 


7.22 


5.31 


7.42 


5.31 


7.36 


5.31 


7.30 


5.31 


7.26 


5.31 


7.43 


5.31 


7.38 


5.31 


7.12 


5.31 


7.18 


5.31 


7.14 


6.03 


7.14 


6.03 


7.10 


6.03 


7.07 


6.03 


7.04 


6.03 


7.01 


6.03 


6.97 


6.03 


6.94 


6.03 


6.92 


6.03 


6.97 


6.03 


6.94 


6.03 


6.99 


6.03 


7.04 


6.03 


7.00 


fe . 03 


7.05 


6.03 


7.15 


6.03 


7.07 


6.03 


6.93 


6.03 


6.97 


6.03 


6.95 


6.03 


6.93 


6.03 


6.91 


6.03 


6.95 


6.03 


6.99 
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TABLE D-3a 

SIMULATION WITH MINIMAX RULE - FIRST RUN 



Assumed 0 ^* 0 ^ 



1 1 

2 2 

3 1 

4 2 

5 - 1 

6 2 

7 1 

8 2 

9 1 

10 2 

11 1 

12 2 

13 1 

14 2 

15 1 

16 2 

17 1 

18 2 

19 1 

20 2 

21 1 

22 2 

23 1 

24 2 

25 1 

26 2 

27 1 

28 2 

29 1 

30 2 

31 1 

32 2 

33 1 

34 2 

35 1 

36 2 

37 1 

38 2 

39 1 

40 2 

41 1 

42 2 

43 * 2 

44 2 

45 2 

46 2 

47 2 

48 2 

49 2 

50 2 

51 2 

52 2 

53 2 

54 2 

55 2 

56 2 

57 2 

58 2 



.0 



X . 
n J 


S/n 

n 


8.81 


8.81 


11.82 


10.32 


5.78 


8.81 


9.01 


8.86 


6.23 


8.33 


5.98 


7.94 


11.81 


8. 50 


6.07 


8.19 


8.81 


8.26 


15.13 


8.95 


11.82 


9.21 


6.22 


8.96 


11.77 


9.18 


3.20 


8.75 


3.34 


8.39 


8.92 


8.42 


3.23 


8.12 


6.04 


8.00 


8.74 


8.04 


6.02 


7.94 


8.71 


7.98 


6.12 


7.89 


8.79 


7.93 


9.04 


7.98 


3.09 


7.78 


5.83 


7.71 


0.54 


7.44 


8.80 


7.49 


5.97 


7.44 


11.81 


7.59 


5.76 


7.53 


5.62 


7.47 


6.25 


7.43 


- 0.02 


7.21 


8.81 


7.26 


8.86 


7.30 


5.79 


7.26 


8.84 


7.30 


5.78 


7.26 


6.03 


7.23 


6.23 


7.21 


11.87 


7.32 


5.92 


7.29 


0.39 


7.13 


6.07 


7.11 


8.97 


7.15 


6.00 


7.12 


8.86 


7.16 


5.98 


7.14 


5.62 


7.11 


5.92 


7.08 


3.20 


7.01 


8.86 


7.04 


5.83 


7.02 


6.00 


7.00 


6.22 


6.99 


9.04 


7.02 


3.20 


6.96 



M 1 “ 6.60 

n 1 


* i 2 - 7.25 

n *2 


8.81 


0.00 


8.81 


11.82 


7.30 


11.82 


7.30 


10.42 


6. 94 


10.42 


6.94 


8.94 


8.16 


8.94 


8.16 


8.22 


8.29 


8.22 


8.29 


9.61 


8.88 


9.61 


8.88 


9.04 


9.29 


9.04 


9.29 


8.21 


8.55 


8.21 


8.55 


8.30 


7.96 


8.30 


7.96 


8.05 


8.04 


8.05 


8.04 


7.85 


8.10 


7.85 


8.10 


7.69 


8.16 


7.69 


8.16 


7.80 


7.77 


7.80 


7.77 


7.65 


7.25 


7.65 


7.25 


7.73 


7.17 


7.73 


7.17 


8 . Q 1 


7.08 


8 . Cl 


7.08 


7.86 


7 . 0 b 


7.86 


7.03 


7.39 


7.13 


7.39 


7.15 


7.47 


7.06 


7.47 


7.06 


7.55 


7.00 


7.55 


7.00 


7.47 


6.96 


7.47 


6.96 


7.68 


6.96 


7.60 


6.96 


7.29 


6.96 


7.24 


6.96 


7.31 


6.96 


7.26 


6.96 


7.32 


6.96 


7.27 


6.96 


7.21 


6.96 


7.17 


6.96 


7.04 


6.96 


7.10 


6.96 


7.06 


6.96 


7.03 


6.96 


7.01 


6.96 


7.06 


6.96 


6.96 



TABLE D-3b 

SIMULATION WITH MINIMAX RULE - SECOND RUN 





Assumed a^o 


,■3.0 
















( ij * 6.60 


M 2 "7.25 


n 


W i 


n X J 


S n 'n 


n*l 


n *2 


1 


1 


6.23 


6.23 


6.23 


0.00 


2 


2 


5.08 


6.11 


6.23 


5.98 


3 


1 


11.81 


8.01 


9.02 


5.98 


4 


2 


6.07 


7.53 


9.02 


8.03 


$ 


1 


8.81 


7.78 


8.95 


6.03 


S 


2 


15.13 


9.01 


8.95 


9.06 


7 


1 


11.82 


9.41 


9.67 


9.06 


8 


2 


6.22 


9.01 


9.67 


8.35 


9 


1 


11.77 


9.32 


10.00 


8.35 


10 


2 


3.20 


8.71 


10.09 


7.32 


n * 


1 


3.34 


8.22 


8.97 


7.32 


12 


1 


3.23 


7.81 


8.15 


7.32 


13 


1 


8.74 


7.88 


8.22 


7.32 


14 


1 


8.71 


7.94 


8.28 


7.32 


15 


1 


8.79 


7.99 


8.33 


7.32 


16 


1 


3.09 


7.69 


7.85 


7.32 


17 


1 


8.81 


7.75 


7.93 


7.32 


18 


1 


5.97 


7.66 


7.78 


7.32 


19 


1 


5.76 


7.56 


7.64 


7.32 


20 


1 


6.25 


7.49 


7.55 


7.32 


21 


1 


8.81 


7.55 


7.63 


7.32 


22 


1 


5.79 


7.47 


7.52 


7.32 


23 


1 


5.78 


7.40 


7.42 


7.32 


24 


1 


6.23 


7.35 


7.36 


7.32 


25 


1 


3.31 


7.19 


7.16 


7.32 


26 


1 


5.98 


7.14 


7.10 


7.32 


27 


1 


3.08 


6.99 


6.92 


7.32 


28 


1 


3.11 


6.86 


6.75 


7.32 


29 


1 


8.72 


6.92 


6.84 


7.32 


30 


1 


5.67 


6.88 


6.79 


7.32 


31 


1 


2.77 


6.75 


6.63 


7.32 


32 


1 


8.86 


6.81 


6.72 


7.32 


33 


1 


5.78 


6.78 


6.68 


7.32 


34 


1 


8.74 


6.84 


6.76 


7.32 


35 


1 


6.12 


6.82 


6.73 


7.32 


36 


1 


0.54 


6.64 


6.53 


7.32 


37 


1 


8.87 


6.70 


6.61 


7.32 


38 


1 


11.82 


6.84 


6.77 


7.32 


39 


1 


3.11 


6.74 


6.66 


7.32 


40 


1 


8.74 


6.79 


6.72 


7.32 


41 


1 


8.74 


6.84 


6.77 


7.32 


42 


1 


8.74 


6.89 


6.83 


7.32 


43 


1 


3.09 


6.80 


6.73 


7.32 


4 i 


1 


8.74 


6.84 


6.78 


7.32 


4^ 


1 


2.77 


6.75 


6.63 


7.32 


4 b 


1 


5.97 


6.74 


6.66 


7.32 


47 


1 


5.79 


6.7* 


6.64 


7.32 


48 


1 


3.34 


6.65 


6.57 


7.32 


49 


1 


5.97 


6.63 


6„55 


7.32 


50 


1 


3.08 


6.56 


6.48 


7.32 


51 


1 


11.82 


6.66 


6.59 


7.32 


52 


1 


5.79 


6.65 


6.58 


7.32 


53 


1 


6.23 


6.64 


6.57 


7.32 


54 


1 


3.08 


6.57 


6.50 


7.32 


55 


1 


8.74 


6.61 


6.54 


7.32 


56 


1 


3.31 


6.55 


6.48 


7.32 


57 


1 


6.23 


6.55 


6.47 


7.32 


58 


1 


8.74 


6.59 


6.52 


7.32 
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TABLE D-3c 

SIMULATION WITH MINIMAX RULE - THIRD RUN 



1 



Assumed < 3r 1 * CT 2 a,3 ‘ 0 



1 

2 

3 

4 

5 

6 

7 

8 

9 

10 
11 
12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 
49* 

50 

51 

52 

53 

54 

55 

56 

57 

58 



1 

2 

1 

2 

1 

2 

1 

2 

1 

2 

1 

2 

1 

2 

1 

2 

1 

2 

1 

2 

1 

2 

1 

2 

1 

2 

1 

2 

1 

2 

1 

2 

1 

2 

1 

2 

1 

2 

1 

2 

1 

2 

1 

2 

1 

2 

1 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 



X . 
n 3 


S /n 
n 


8.81 


8.31 


15.13 


11.97 


11.82 


11.92 


6.22 


10.50 


11.77 


10.75 


3.20 


9.50 


3.34 


8.62 


6.07 


8.30 


3.23 


7.74 


6.04 


7.57 


8.74 


7.67 


6.02 


7.54 


8.71 


7.63 


6.12 


7.52 


8.79 


7.61 


9.04 


7.70 


3.09 


7.42 


5.83 


7.34 


11.81 


7.57 


8.80 


7.63 


5.97 


7.55 


9.01 


7.62 


5.76 


7.54 


5.62 


7.46 


6.25 


7.41 


-0.02 


7.13 


8.81 


7.19 


8.86 


7.25 


5.79 


7.20 


5.98 


7.16 


5.78 


7,11 


11.82 


7,26 


6.23 


7.23 


11.87 


7.37 


3.31 


7.25 


5.92 


7.21 


5.98 


7.18 


0.39 


7.00 


3.08 


6.90 


6.07 


6.88 


3.11 


6.79 


8.97 


6.84 


8.72 


6.89 


6.00 


6.87 


5.67 


6.84 


6.33 


6.82 


2.77 


6.74 


11.81 


6.84 


8.84 


6.88 


8.92 


6.92 


6.07 


6.91 


6.02 


6.89 


-0. 02 


6.76 


6.02 


6.75 


0.39 


6.63 


6.04 


6.62 


9.04 


6.66 


6.03 


6.65 



M^*6. 60 


7.25 


x„ 


X , 


n 1 


n 2 


8.81 


0.00 


8.81 


15.13 


10.32 


15.13 


10.32 


10.68 


10.80 


10.68 


10.80 


8.19 


8.94 


8.19 


8.94 


7.66 


7.60 


7.66 


7.80 


7.34 


7.96 


7.34 


7.96 


7.12 


8.06 


7.12 


8.06 


6.98 


8.16 


6.98 


8.16 


7.23 


7.59 


7.23 


7.59 


7.08 


8.02 


7.08 


8.02 


7.25 


7.83 


7.25 


7.83 


7.41 


7.66 


7.41 


7.66 


7.26 


7.55 


7.26 


7.55 


6.70 


7.64 


6.70 


7.64 


6.86 


7.52 


6.86 


7.52 


6.80 


7.41 


6.80 


7.41 


7.11 


7.34 


7.11 


7.34 


7.39 


7.12 


7.39 


7.12 


7.31 


7.06 


7.31 


7.06 


6.95 


6.86 


6.65 


6.8 S 


6.90 


6.68 


6.90 


6.68 


7.00 


6.77 


7.00 


6.77 


6.96 


6.73 


6.96 


6.73 


6.62 


6.56 


6.92 


6.56 


7.12 


6.56 


7.16 


6.56 


7.26 


6.56 


7.21 


6.56 


7.17 


6.56 


6.62 


6.56 


6.86 


6.56 


6.68 


6.56 


6.66 


6.56 


6.74 


6.56 


6.71 
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TABLE D-4a 

stmi LATIQN WITH BACKWARDS INDUCTION RULE 



- FIRST RUN 



AMtua«dc. a ff 2 a 3.0 "indlcatf* ob..rv.tlon from n b 



Hj-7. 38 




1 

2 

3 

4 

5 

6 

7 

8 

9 

10 
11 
12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 

52 

53 

54 

55 

56 

57 

58 



2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 



6.23 
5.98 

11.81 

8.81 

11.82 

11.77 

3.34 

3.23 
8.74 
8.71 

8.79 
3.09 
8.81 

5.97 
5.76 
6.25 
8.81 

5.79 
5.78 

6.23 
3.31 

6.07 

5.98 

3.08 

3.11 
8.72 
5.67 
2.77 

15.13 

6.22 

3.20 

8.92 

6.04 

6.02 

6.12 

9.04 

5.83 
8.80 
9.01 
5.62 

“ 0.02 

8.86 

8.84 
11.82 

11.87 

5.92 
0.39 
6.07 
8.97 
6.00 

6.03 
11.81 
15.13 

11.87 

6.04 

8.92 
8.92 
6.12 



6.23 
6.11 
8.01 
8.21 
8.93 
9.41 
8.54 
7.88 
7.97 

8.05 

8.12 

7.70 
7.78 
7.65 
7.53 
7.45 
7.53 
7.43 
7.35 
7.29 
7.10 

7.05 
7.01 
6.84 

6.70 
6.77 
6.73 
6.59 

6.89 
6.86 
6.75 
6.81 
6.79 
6.77 
6.75 
8.81 

6.79 
6.84 

6.90 
6.86 
6.70 
6.75 

6.80 

6.91 
7.02 
7.00 
6.86 

6.84 
6.88 
6.87 

6.85 
6.95 
7.10 

7.19 
7.17 

7.20 
7.23 

7.21 



6.23 
6.23 
9.02 
8.95 
9.67 
10.09 
8.97 

8.15 
8.22 
8.28 
8. 33 
7.85 
7.93 

7.78 
7.64 
7.55 

7.63 
7.52 
7.42 
7.36 

7.16 
7.16 
7.10 
6.92 
6.75 
6.84 

6.79 

6.63 
6.63 
6.63 
6.63 
6.63 
6.63 
6.63 
6.63 
6.63 
6.63 
6.63 
6.63 
6.63 
6.63 
6.63 
6.63 
6.63 
6.63 
6.63 
6.63 
6.63 
6.63 
6.63 
6.63 
6.63 
6.63 
6.63 
6.63 
6.63 

6.63 

6.63 



0.00 
5.98 
5.98 
5.98 
5.98 
5.98 
5.98 
5.98 
5.98 
5.98 
5.98 
5.98 
5.98 
5.98 
5.98 
5.98 
5.98 
5.98 
5.98 
5.98 
5.98 
6.03 
6.03 
6.03 
6.03 
6.03 
6.03 
6.03 
9.06 
8. 35 
7.32 
7.59 
7.37 
7.20 
7.08 

7.28 
7.15 

7.29 
7.42 
7.29 
6.80 
6.93 
7.04 
7.31 
7.55 
7.47 
7.13 
7.08 
7.17 
7.12 
7.07 
7.26 
7.55 
7.70 
7.65 
7.69 
7.73 
7.68 
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TABLE D-4b 

SIMULATION WITH BACKWARDS INDUCTION RULE - SECOND RUN 



Aaaumed v,*v«3.0 *Ltdlcates observation from d* 

1 Ct 



n 




n X J 


V“ 


^■6. 60 
A 


#i 2 -7.25 

A 


1 


1 


6.23 


6.23 


6.23 


0. 00 


2 


2 


6.07 


6.15 


6.23 


6.07 


3 


1 


3.34 


5.22 


4.79 


6.07 


4 


2 


6.12 


5.44 


4.79 


6.10 


5 


2 


5.62 


5.48 


4.79 


5.94 


6 


2 


5.98 


5.56 


4.79 


5.95 


7 




6.03 


5.63 


4.79 


5.97 


8 


2 &Tmm * 


" "-o'. (Jin ** 


4.92 


4.79 


4.97 


9 


1* 


3.31 


4.75 


4.30 


4.97 


10 


i* 


3.23 


4.59 


4.03 


4.97 


11 


2 


8e92 


4.99 


4.03 


5.53 


12 


2 


8.80 


5.31 


4.03 


5.94 


13 


2 


9.01 


5.59 


4.03 


6.28 


14 


2 


15.13 


6.27 


4.03 


7.17 


15 


2 


8.86 


6.45 


4.03 


7.32 


16 


2 


6.22 


6.43 


4.03 


7.23 


17 


2 


3.20 


6.24 


4.03 


6.92 


18 


2 


0.39 


5.92 


4.03 


6.46 


19 


2 


6.02 


5.92 


4.03 


6.43 


20 


2 


5.92 


5.92 


4.03 


6.40 


21 


2 


6.07 


5.93 


4.03 


6.38 


22 


2 


11.82 


6.20 


4.03 


6.68 


23 


2 


6.04 


6.19 


4.03 


6.65 


24 


2 


9.04 


6.31 


4.03 


6.77 


25 


2 


8.97 


6.42 


4.03 


6.87 


26 


2 


11.87 


6.63 


4.03 


7.10 


27 


2 


5.83 


6.60 


4.03 


7.04 


28 


2 


8.84 


6.68 


4.03 


7. 12 


29 


2 


11.81 


6.86 


4.03 


7.31 


30 


2 


6.00 


6.83 


4.03 


7.28 


31 


2 


6.22 


6.81 


4.03 


7.22 


32 


2 


11.82 


6.96 


4.03 


7.38 


33 


2 


0.39 


6.77 


4.03 


7.14 


34 


2 


6.00 


6.74 


4.03 


7.10 


35 


2 


<0.02 


6.55 


4.03 


6.87 


36 


2 


15.13 


6.79 


4.03 


7.13 


37 


2 


6.03 


6.77 


4.03 


7.10 


38 


2 


11.82 


6.90 


4.03 


7.24 


39 


2 


-0. 02 


6.72 


4.03 


7.03 


40 


2 


6.12 


6.71 


4.03 


7.01 


41 


2 


5.98 


6.69 


4.03 


6.98 


42 


2 


3.20 


6.61 


4.03 


0.88 


43 


2 


8.97 


6.66 


4.03 


6.93 


44 


2 


11.82 


6.78 


4.03 


7.05 


45 


2 


8.84 


6.83 


4.03 


7.10 


46 


2 


15.13 


7.01 


4.03 


7.29 


47 


2 


11.87 


7.11 


4.03 


7.40 


48 


2 


6.02 


7.09 


4.03 


7.37 


49 


2 


6.12 


7.07 


4.03 


7.34 


50 


2 


6.04 


7.05 


4.03 


7.31 


51 


2 


6.03 


7.03 


4.03 


7.28 


52 


2 


15.13 


7.18 


4.03 


7.45 


53 


2 


6.07 


7.16 


4.03 


7.42 


54 


2 


6.02 


7.14 


4.03 


7.39 


55 


2 


11.82 


7.23 


4.03 


7.48 


56 


2 


0.39 


7.10 


4.03 


7.34 


57 


2 


6.00 


7.09 


4.03 


7.32 


58 


2 


5.92 


7.07 


4.03 


7.29 



TABLE D-4c 

SIMULATION WITH BACKWARDS INDUCTION RULE - THIRD RUN 



n 




n X i 


Sjn 

n 


^■6. 60 
a X l 




M 2 »7.25 

n X 2 


1 


1 


11.81 


11.81 


11.81 




0.00 


2 


2 


8.80 


10.31 


11.81 




8.80 


3 


1 


3.23 


7.95 


7.52 




8.80 


4 


2 


11.87 


8.93 


7.52 




10.34 


5 


2 


5.98 


8.34 


7.52 




8.89 


6 


2 


6.07 


7.96 


7.52 




8.18 


7 


2 


15.13 


8.99 


7.52 




9.57 


8 


2 


6.22 


8.64 


7.52 




9.02 


9 


2 


3.20 


8.04 


7.52 




8.19 


10 


2 


8.92 


8.13 


7.52 




8.28 


11 


2 


6.04 


7.94 


7.52 




8.03 


12 


i* 


6.23 


7.80 


7.09 




8.03 


13 


2 


6.02 


7.66 


7.09 




7.83 


14 


2 


6.12 


7.55 


7.09 




7.67 


15 


2 


9.04 


7.65 


7.09 




7.79 


16 


2 


5.83 


7.54 


7.09 




7.64 


17 


2 


6.03 


7.45 


7.09 




7.52 


18 


1* 


0.54 


7.06 


5.46 




7.52 


19 


2 


9.01 


7.17 


5.46 




7.62 


20 


2 


5.62 


7.09 


5.46 




7.50 


21 


2 


-0.02 


6.75 


5.46 




7.06 


22 


2 


8.86 


6.85 


5.46 




7.16 


23 


2 


8.84 


6.93 


5.46 




7.25 


24 


2 


11.82 


7.14 


5.46 




7.47 


25 


2 


11.81 


7.33 


5.46 




7.68 


26 


2 


5.92 


7.27 


5.46 




7.60 


27 


2 


0.39 


7.02 


5.46 




7.29 


28 


2 


6.07 


6.98 


5.46 




7. 24 


29 


2 


8.97 


7.05 


5.46 




7.31 


30 


2 


6.00 


7.02 


5.46 




7.26 


31 


2 


3.20 


6.89 


5.46 




7. 11 


32 


2 


6.07 


6.87 


5.46 




7.07 


33 


2 


6.03 


6.84 


5.46 




7.03 


34 


2 


5.98 


6.82 


5.46 




7.00 


35 


2 


5.98 


6.79 


5.46 




6.97 


36 


2 


6.02 


6.77 


5.46 




6. 94 


37 


2 


11.81 


6.91 


5.46 




7.08 


38 


2 


6.12 


6.89 


5.46 




7.06 


39 


2 


6.12 


6.87 


5.46 




7.03 


40 


2 


11.82 


6.99 


5.46 




7 . 16 


41 


2 


6.12 


6.97 


5.46 




7.14 


42 


2 


0.39 


6.81 


5.46 




6.96 


43 


2 


8.80 


6.86 


5.46 




7.01 


44 


2 


15.13 


7.05 


5.46 




7.21 


45 


2 


5.62 


7.02 


5.46 




7.17 


46 


2 


8.86 


7.06 


5.46 




7.21 


47 


2 


5.98 


7.03 


5.46 




7.18 


48 


2 


5.62 


7.01 


5.46 




7.15 


49 


2 


5.92 


6.98 


5.46 




7.12 


50 


2 


3.20 


6.91 


5.46 




7.03 


51 


2 


8.86 


6.95 


5.46 




7.07 


52 


2 


5.83 


6.92 


5.46 




7.05 


53 


2 


6.00 


6.91 


5.46 




7.03 


54 


2 


6.22 


6.89 


5.46 




7.01 


55 


2 


9.04 


6.9? 


5.46 




7 05 


56 


2 


3.20 


6.87 


5.46 




t ..98 


57 


2 


6.07 


6.85 


5.46 




6.96 


58 


2 


6.00 


6.84 


5.46 




6.94 
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